Deployment System#

A comprehensive deployment solution with monitoring and automation for your NixOS and nix-darwin fleet.

Features#

  • 🚀 Automated Deployments - Deploy to all hosts or specific hosts with one command
  • 📊 Fleet Monitoring - Real-time health checks and status dashboard
  • 🔄 Auto-rollback - Automatic rollback on deployment failures via magic rollback
  • 🍎 Darwin Support - Full support for macOS hosts with proper builder user handling
  • 🔐 Secure by Default - Uses sops-encrypted deploy keys
  • 🌐 Tailscale Integration - All hosts accessible via Tailscale network

Quick Start#

Initial Setup#

  1. Ensure deploy key is configured in your sops secrets (secrets.yaml):

    nix:
      deploy:
        priv-key: <your-encrypted-ssh-private-key>
        pub-key: ssh-ed25519 AAAA...
    

  2. Enable deploy node on a host that will perform deployments:

    # In your host configuration
    tsunaminoai.nix.isDeployNode = true;
    
    This installs the deploy key to /root/.ssh/id_deploy via sops.

  3. Build and test locally first:

    nix flake check
    

Deployment Commands#

Command Description
nix run .#deploy Raw deploy-rs access
nix run .#deploy-host Deploy to a specific host (interactive)
nix run .#deploy-host <name> Deploy to a named host
nix run .#deploy-all Deploy to all configured hosts
nix run .#deploy-monitor Check status of all hosts
nix run .#deploy-rollback Rollback a host to previous generation

Monitor Fleet Status#

nix run .#deploy-monitor

Shows connectivity and system info for all hosts.

Deploy to Specific Host#

# Interactive selection
nix run .#deploy-host

# Or specify directly
nix run .#deploy-host shinobu
nix run .#deploy-host shinobu switch  # or boot, test

Deploy to All Hosts#

nix run .#deploy-all

This will: 1. Run pre-deployment health check 2. Prompt for confirmation 3. Deploy to all reachable hosts 4. Show post-deployment status

Rollback a Host#

# Interactive selection
nix run .#deploy-rollback

# Or specify directly
nix run .#deploy-rollback shinobu

Using deploy-rs Directly#

The system also exposes native deploy-rs configuration:

# Deploy all hosts
nix run .#deploy -- .

# Deploy specific host
nix run .#deploy -- .#shinobu

# Skip checks (faster)
nix run .#deploy -- --skip-checks .#mokou

# Dry run
nix run .#deploy -- --dry-activate .

Monitoring#

CLI Monitor#

The deploy-monitor command provides real-time fleet status:

nix run .#deploy-monitor

Output includes for each host: - Connectivity status (online/offline) - System type (NixOS/Darwin) - Current generation number - NixOS/Darwin version

Manual Status Check#

SSH into individual hosts to check status:

# NixOS
ssh root@<host>.armadillo-banfish.ts.net 'readlink /nix/var/nix/profiles/system'

# Darwin
ssh builder@<host>.armadillo-banfish.ts.net 'readlink /nix/var/nix/profiles/system'

Configuration#

Deploy Node Configuration#

Configure hosts as deploy nodes in your host configuration:

tsunaminoai = {
  nix.isDeployNode = true;
};

This will: - Deploy the sops-managed SSH key to /root/.ssh/id_deploy - Allow this host to run deploy commands

Deploy Target Configuration#

Targets are automatically discovered from your flake’s nixosConfigurations and darwinConfigurations.

NixOS targets need: 1. SSH access for root user 2. Deploy public key in authorized_keys

Darwin targets need: 1. SSH access for builder user 2. builder user must have passwordless sudo 3. Deploy public key in authorized_keys

Architecture#

Module Structure#

modules/flake/deploy/
└── default.nix          # Main deploy-rs flake module

modules/flake/nix/
└── deploy.nix           # Sops deploy key configuration

modules/flake/checks/
└── deploy.nix           # VM integration tests

Deploy Flake Module (modules/flake/deploy/default.nix)#

The main deployment module provides:

Flake Outputs: - deploy.nodes - Auto-generated deploy-rs node configuration from nixosConfigurations and darwinConfigurations

Apps (per-system): - deploy - Raw deploy-rs binary - deploy-host - Interactive host deployment - deploy-all - Fleet-wide deployment - deploy-monitor - Health check dashboard - deploy-rollback - Generation rollback

Packages (per-system): - deploy-system - Unified NixOS/Darwin deployment script - deploy-monitor - Status monitoring script - deploy-all - Batch deployment script - deploy-host - Single-host deployment - deploy-rollback - Rollback script

Node Configuration#

Nodes are automatically generated from your flake’s system configurations:

# Auto-discovered from:
self.nixosConfigurations   # → type: "nixos", user: "root"
self.darwinConfigurations  # → type: "darwin", user: "builder"

Each node is configured with:

Setting NixOS Darwin
sshUser root builder
user root builder
hostname <name>.armadillo-banfish.ts.net same
remoteBuild true true
autoRollback true true
magicRollback true true
activationTimeout 900s (15min) same
confirmTimeout 60s same

SSH Key Management#

The deploy key is managed via sops-nix:

# modules/flake/nix/deploy.nix
sops.secrets."nix/deploy/priv-key" = lib.mkIf cfg.isDeployNode {
  mode = "0400";
  path = "/root/.ssh/id_deploy";
};

Enable on a host to make it a deploy node:

tsunaminoai.nix.isDeployNode = true;

Darwin-Specific Handling#

Darwin hosts require special handling because: 1. Root SSH is disabled on macOS 2. Activation requires sudo

The module automatically: - Uses builder user for SSH - Runs activation with sudo - Handles Darwin-specific profile paths

Deployment Flow#

┌─────────────────┐
│   Deploy Node   │
│  (any host with │
│  isDeployNode)  │
└────────┬────────┘
         │
         ├─────► deploy-monitor
         │       └─► SSH to each host
         │           └─► Detect host type (NixOS/Darwin)
         │               └─► Query generation & version
         │
         ├─────► deploy-host / deploy-all
         │       └─► Build configuration locally
         │           └─► Copy closure to target
         │               └─► Activate with rollback
         │
         └─────► deploy-rollback
                 └─► SSH to host
                     └─► nix-env --rollback
                         └─► switch-to-configuration

Network Requirements#

  • Deploy nodes must be able to SSH to all target hosts on port 22
  • Hosts must be reachable via <hostname>.armadillo-banfish.ts.net (Tailscale)

Troubleshooting#

Deployment Fails#

  1. Check SSH connectivity:

For NixOS hosts:

ssh -i /root/.ssh/id_deploy root@<host>.armadillo-banfish.ts.net

For Darwin hosts:

ssh -i /root/.ssh/id_deploy builder@<host>.armadillo-banfish.ts.net

  1. Verify build succeeds locally:

    # NixOS
    nix build .#nixosConfigurations.<host>.config.system.build.toplevel
    
    # Darwin
    nix build .#darwinConfigurations.<host>.config.system.build.toplevel
    

  2. Check deploy-rs directly:

    nix run .#deploy -- .#<hostname> --dry-activate
    

Darwin Deployment Issues#

  1. “Permission denied” errors: - Ensure builder user has passwordless sudo:

    # On the Darwin host
    echo "builder ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/builder
    

  2. SSH fails as builder: - Verify authorized_keys is set up:

    cat ~builder/.ssh/authorized_keys
    

Host Shows Offline in Monitor#

  1. Check Tailscale connectivity:

    tailscale ping <hostname>
    

  2. Verify host is running:

    ping <hostname>.armadillo-banfish.ts.net
    

  3. Check SSH service:

    ssh root@<hostname>.armadillo-banfish.ts.net systemctl status sshd
    

Rollback Issues#

  1. Manual rollback:
    # NixOS
    ssh root@<host> 'nix-env -p /nix/var/nix/profiles/system --rollback && /nix/var/nix/profiles/system/bin/switch-to-configuration switch'
    
    # Darwin
    ssh builder@<host> 'sudo nix-env -p /nix/var/nix/profiles/system --rollback && sudo /nix/var/nix/profiles/system/activate-user && sudo /nix/var/nix/profiles/system/activate'
    

Security Considerations#

  • Deploy key has root access to all hosts
  • Store deploy key securely in sops-nix (at nix/deploy/priv-key)
  • Limit deploy nodes to trusted hosts via isDeployNode
  • Use Tailscale for secure network access
  • Monitor deployment activity

Migration from Manual Deployment#

If you were previously deploying manually with nixos-rebuild:

  1. Ensure deploy key is in sops (nix/deploy/priv-key)
  2. Enable tsunaminoai.nix.isDeployNode = true on at least one host
  3. Rebuild that host to deploy the key
  4. Test with nix run .#deploy-host <hostname> on a single host
  5. Once verified, use nix run .#deploy-all for fleet-wide updates

VM Testing#

The deployment infrastructure includes VM integration tests:

# Run the deployment test
nix build .#checks.x86_64-linux.vm-test-deployment

# Run interactively
nix build .#checks.x86_64-linux.vm-test-deployment.driverInteractive
./result/bin/nixos-test-driver

The test verifies: - Deploy-rs node configuration - SSH connectivity between nodes - Key distribution

Future Enhancements#

Planned improvements: - [ ] Notification integration (Slack/Discord) - [ ] Prometheus metrics export - [ ] Deployment history tracking - [ ] CI/CD pipeline integration - [ ] Web dashboard for monitoring