Deployment System#
A comprehensive deployment solution with monitoring and automation for your NixOS and nix-darwin fleet.
Features#
- 🚀 Automated Deployments - Deploy to all hosts or specific hosts with one command
- 📊 Fleet Monitoring - Real-time health checks and status dashboard
- 🔄 Auto-rollback - Automatic rollback on deployment failures via magic rollback
- 🍎 Darwin Support - Full support for macOS hosts with proper
builderuser handling - 🔐 Secure by Default - Uses sops-encrypted deploy keys
- 🌐 Tailscale Integration - All hosts accessible via Tailscale network
Quick Start#
Initial Setup#
-
Ensure deploy key is configured in your sops secrets (
secrets.yaml):nix: deploy: priv-key: <your-encrypted-ssh-private-key> pub-key: ssh-ed25519 AAAA... -
Enable deploy node on a host that will perform deployments:
This installs the deploy key to# In your host configuration tsunaminoai.nix.isDeployNode = true;/root/.ssh/id_deployvia sops. -
Build and test locally first:
nix flake check
Deployment Commands#
| Command | Description |
|---|---|
nix run .#deploy |
Raw deploy-rs access |
nix run .#deploy-host |
Deploy to a specific host (interactive) |
nix run .#deploy-host <name> |
Deploy to a named host |
nix run .#deploy-all |
Deploy to all configured hosts |
nix run .#deploy-monitor |
Check status of all hosts |
nix run .#deploy-rollback |
Rollback a host to previous generation |
Monitor Fleet Status#
nix run .#deploy-monitor
Shows connectivity and system info for all hosts.
Deploy to Specific Host#
# Interactive selection
nix run .#deploy-host
# Or specify directly
nix run .#deploy-host shinobu
nix run .#deploy-host shinobu switch # or boot, test
Deploy to All Hosts#
nix run .#deploy-all
This will: 1. Run pre-deployment health check 2. Prompt for confirmation 3. Deploy to all reachable hosts 4. Show post-deployment status
Rollback a Host#
# Interactive selection
nix run .#deploy-rollback
# Or specify directly
nix run .#deploy-rollback shinobu
Using deploy-rs Directly#
The system also exposes native deploy-rs configuration:
# Deploy all hosts
nix run .#deploy -- .
# Deploy specific host
nix run .#deploy -- .#shinobu
# Skip checks (faster)
nix run .#deploy -- --skip-checks .#mokou
# Dry run
nix run .#deploy -- --dry-activate .
Monitoring#
CLI Monitor#
The deploy-monitor command provides real-time fleet status:
nix run .#deploy-monitor
Output includes for each host: - Connectivity status (online/offline) - System type (NixOS/Darwin) - Current generation number - NixOS/Darwin version
Manual Status Check#
SSH into individual hosts to check status:
# NixOS
ssh root@<host>.armadillo-banfish.ts.net 'readlink /nix/var/nix/profiles/system'
# Darwin
ssh builder@<host>.armadillo-banfish.ts.net 'readlink /nix/var/nix/profiles/system'
Configuration#
Deploy Node Configuration#
Configure hosts as deploy nodes in your host configuration:
tsunaminoai = {
nix.isDeployNode = true;
};
This will:
- Deploy the sops-managed SSH key to /root/.ssh/id_deploy
- Allow this host to run deploy commands
Deploy Target Configuration#
Targets are automatically discovered from your flake’s nixosConfigurations and darwinConfigurations.
NixOS targets need: 1. SSH access for root user 2. Deploy public key in authorized_keys
Darwin targets need:
1. SSH access for builder user
2. builder user must have passwordless sudo
3. Deploy public key in authorized_keys
Architecture#
Module Structure#
modules/flake/deploy/
└── default.nix # Main deploy-rs flake module
modules/flake/nix/
└── deploy.nix # Sops deploy key configuration
modules/flake/checks/
└── deploy.nix # VM integration tests
Deploy Flake Module (modules/flake/deploy/default.nix)#
The main deployment module provides:
Flake Outputs:
- deploy.nodes - Auto-generated deploy-rs node configuration from nixosConfigurations and darwinConfigurations
Apps (per-system):
- deploy - Raw deploy-rs binary
- deploy-host - Interactive host deployment
- deploy-all - Fleet-wide deployment
- deploy-monitor - Health check dashboard
- deploy-rollback - Generation rollback
Packages (per-system):
- deploy-system - Unified NixOS/Darwin deployment script
- deploy-monitor - Status monitoring script
- deploy-all - Batch deployment script
- deploy-host - Single-host deployment
- deploy-rollback - Rollback script
Node Configuration#
Nodes are automatically generated from your flake’s system configurations:
# Auto-discovered from:
self.nixosConfigurations # → type: "nixos", user: "root"
self.darwinConfigurations # → type: "darwin", user: "builder"
Each node is configured with:
| Setting | NixOS | Darwin |
|---|---|---|
sshUser |
root |
builder |
user |
root |
builder |
hostname |
<name>.armadillo-banfish.ts.net |
same |
remoteBuild |
true |
true |
autoRollback |
true |
true |
magicRollback |
true |
true |
activationTimeout |
900s (15min) | same |
confirmTimeout |
60s | same |
SSH Key Management#
The deploy key is managed via sops-nix:
# modules/flake/nix/deploy.nix
sops.secrets."nix/deploy/priv-key" = lib.mkIf cfg.isDeployNode {
mode = "0400";
path = "/root/.ssh/id_deploy";
};
Enable on a host to make it a deploy node:
tsunaminoai.nix.isDeployNode = true;
Darwin-Specific Handling#
Darwin hosts require special handling because:
1. Root SSH is disabled on macOS
2. Activation requires sudo
The module automatically:
- Uses builder user for SSH
- Runs activation with sudo
- Handles Darwin-specific profile paths
Deployment Flow#
┌─────────────────┐
│ Deploy Node │
│ (any host with │
│ isDeployNode) │
└────────┬────────┘
│
├─────► deploy-monitor
│ └─► SSH to each host
│ └─► Detect host type (NixOS/Darwin)
│ └─► Query generation & version
│
├─────► deploy-host / deploy-all
│ └─► Build configuration locally
│ └─► Copy closure to target
│ └─► Activate with rollback
│
└─────► deploy-rollback
└─► SSH to host
└─► nix-env --rollback
└─► switch-to-configuration
Network Requirements#
- Deploy nodes must be able to SSH to all target hosts on port 22
- Hosts must be reachable via
<hostname>.armadillo-banfish.ts.net(Tailscale)
Troubleshooting#
Deployment Fails#
- Check SSH connectivity:
For NixOS hosts:
ssh -i /root/.ssh/id_deploy root@<host>.armadillo-banfish.ts.net
For Darwin hosts:
ssh -i /root/.ssh/id_deploy builder@<host>.armadillo-banfish.ts.net
-
Verify build succeeds locally:
# NixOS nix build .#nixosConfigurations.<host>.config.system.build.toplevel # Darwin nix build .#darwinConfigurations.<host>.config.system.build.toplevel -
Check deploy-rs directly:
nix run .#deploy -- .#<hostname> --dry-activate
Darwin Deployment Issues#
-
“Permission denied” errors: - Ensure
builderuser has passwordless sudo:# On the Darwin host echo "builder ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/builder -
SSH fails as builder: - Verify authorized_keys is set up:
cat ~builder/.ssh/authorized_keys
Host Shows Offline in Monitor#
-
Check Tailscale connectivity:
tailscale ping <hostname> -
Verify host is running:
ping <hostname>.armadillo-banfish.ts.net -
Check SSH service:
ssh root@<hostname>.armadillo-banfish.ts.net systemctl status sshd
Rollback Issues#
- Manual rollback:
# NixOS ssh root@<host> 'nix-env -p /nix/var/nix/profiles/system --rollback && /nix/var/nix/profiles/system/bin/switch-to-configuration switch' # Darwin ssh builder@<host> 'sudo nix-env -p /nix/var/nix/profiles/system --rollback && sudo /nix/var/nix/profiles/system/activate-user && sudo /nix/var/nix/profiles/system/activate'
Security Considerations#
- Deploy key has root access to all hosts
- Store deploy key securely in sops-nix (at
nix/deploy/priv-key) - Limit deploy nodes to trusted hosts via
isDeployNode - Use Tailscale for secure network access
- Monitor deployment activity
Migration from Manual Deployment#
If you were previously deploying manually with nixos-rebuild:
- Ensure deploy key is in sops (
nix/deploy/priv-key) - Enable
tsunaminoai.nix.isDeployNode = trueon at least one host - Rebuild that host to deploy the key
- Test with
nix run .#deploy-host <hostname>on a single host - Once verified, use
nix run .#deploy-allfor fleet-wide updates
VM Testing#
The deployment infrastructure includes VM integration tests:
# Run the deployment test
nix build .#checks.x86_64-linux.vm-test-deployment
# Run interactively
nix build .#checks.x86_64-linux.vm-test-deployment.driverInteractive
./result/bin/nixos-test-driver
The test verifies: - Deploy-rs node configuration - SSH connectivity between nodes - Key distribution
Future Enhancements#
Planned improvements: - [ ] Notification integration (Slack/Discord) - [ ] Prometheus metrics export - [ ] Deployment history tracking - [ ] CI/CD pipeline integration - [ ] Web dashboard for monitoring