Local llm CLI — maintaining the flake during an AI outage#

This flake is maintained with help from large language models. If a major cloud-AI outage takes Claude/ChatGPT offline, that workflow shouldn’t stop — so Simon Willison’s llm CLI is wired to the local Ollama on mokou. No cloud keys, no internet: maintenance keeps working entirely on hardware already in this flake.

It complements the agents package (narrow, scripted batch tools) with an interactive, conversation-logging assistant — and kiwix-ask lets the model ground answers in voile’s offline Kiwix archive.

Quick start#

nix develop                                   # llm + kiwix-ask are in the dev shell
llm "rewrite this NixOS option to use mkOption with a default"
llm -t nix-maint "how do I add an opt-in service module in this repo?"
llm-maint "how do I gate a service on a new option?"   # ↑ but with live repo facts injected
nix run .#llm -- -t nix-maint "…"             # without entering the dev shell
kiwix-ask -b stackexchange "fourier transform basics"

The first nix develop after a fresh checkout builds the llm + llm-ollama Python environment once (cached afterwards). It is pinned by flake.lock, so it keeps working with no network.

How it’s wired#

Concern Choice
Package pkgs.python3Packages.llm.withPlugins { llm-ollama = true; } (no top-level pkgs.llm in 26.05)
Backend llm-ollama plugin — auto-discovers whatever models mokou has loaded
Endpoint OLLAMA_HOST defaults to http://mokou.armadillo-banfish.ts.net:11434
Default model qwen2.5-coder:7b (~4.7 GB Q4, fits the GTX 1080’s 8 GB), qwen2.5-coder:3b fallback
State dir LLM_USER_PATH$PWD/.llm (dev shell, git-ignored) or $XDG_STATE_HOME/llm (app)
Placement dev shell (modules/flake/programs/shell.nix) + nix run .#llm / .#kiwix-ask apps

OLLAMA_HOST is the single override knob and is shared with the agents tools — set it once to retarget both. The coder models are pre-pulled by mokou’s services.ollama.loadModels (see Ollama) so they’re available offline.

The nix-maint template#

modules/flake/programs/nix-maint.yaml is an llm template whose system prompt encodes this repo’s conventions — flake-parts layout, the tsunaminoai.* opt-in module shape, alejandra + keep-sorted formatting, the /run/secrets (never /var/run) sops rule, deploy-rs, and Tailscale-FQDN addressing — so even a 7B local model produces patches that tend to pass treefmt and nix flake check.

llm -t nix-maint "add a boolean option tsunaminoai.foo.enable and gate a systemd service on it"

The dev shell installs the template into LLM_USER_PATH/templates/ and sets the default model on entry; nothing to configure by hand.

llm-maint — current facts, auto-injected#

A small local model is frozen at its training cutoff (~2023), so it doesn’t know this flake’s actual nixpkgs pin, module/option layout, or hosts. llm-maint wraps llm -t nix-maint and injects that ground truth as context fragments on every call, so the model reasons over the live repo instead of stale memory:

  • nixpkgs rev / ref / date from flake.lock (the version actually in use),
  • the live tsunaminoai.* option namespaces (grepped from module sources),
  • the NixOS / home-manager module and host lists,
  • the formatting / secrets / deploy / Tailscale conventions,
  • plus any files you pass with -f — typically the module you’re editing.
llm-maint "how do I gate a systemd service on a new tsunaminoai option?"
llm-maint -f modules/nixos/samba/default.nix "why doesn't the share show up?"
Flag / env Meaning
-f, --fragment PATH also inject this file (repeatable)
-m, --model / -c, --num-ctx model id / Ollama context window (default 16384)
LLM_MODEL · LLM_NUM_CTX · OLLAMA_HOST env overrides

It deliberately does not read options.md — that file is generated out-of-band and goes stale; module sources and flake.lock are always current.

Why inject instead of tool-call?

The stack can tool-call (llm --functions / llm tools, and llm-ollama registers tools), and qwen2.5-coder:7b does invoke tools over Ollama. But a 7B model is only so reliable at deciding to call them. Injecting the facts directly is deterministic — the model always sees current state. Reach for tools when the needed fact isn’t known ahead of time.

kiwix-ask — grounded retrieval from voile’s Kiwix#

kiwix-ask searches voile’s offline Kiwix archive, pulls the top article(s), strips them to text, and pipes them into llm as grounding context. During an outage the model can cite offline Wikipedia / DevDocs / Stack Exchange content instead of guessing.

kiwix-ask "how does the Linux page cache work"          # search all books
kiwix-ask -b stackexchange "windowing in an FFT"        # restrict to a book slug
kiwix-ask -b devdocs -q "what does @import do?" "zig import"
kiwix-ask --raw "water filtration"                      # print retrieved context only
kiwix-ask --url /content/<slug>/A/Some_Article "summarise"   # skip search
Flag / env Meaning
-b, --book NAME restrict to books whose slug contains NAME (e.g. wikibooks, stackexchange)
-n, --num N number of articles to retrieve (default 3)
-m, --model / -q, --question model id / question (default: the query itself)
--url PATH fetch a /content/<slug>/… path directly, skip search
--raw print retrieved context without calling the model
KIWIX_HOST Kiwix base URL (default https://voile.armadillo-banfish.ts.net:8092)
KIWIX_INSECURE=1 / KIWIX_CACERT TLS escape hatches for clients that don’t trust the internal CA

How it works. Kiwix book “slugs” carry volatile date suffixes (e.g. wikibooks_en_all_maxi_2021-03) and full-text search is per-book, so kiwix-ask resolves the current slugs live from /catalog/v2/entries, searches each book until it has enough hits, then fetches /content/<slug>/<path> and converts the HTML to text (pkgs/kiwix-ask/html-to-text.py). Nothing dated is hardcoded. NixOS hosts trust the internal step-ca CA, so HTTPS to voile works without -k.

Outage resilience#

Failure Behaviour
Cloud AI APIs down Local path needs no cloud keys and no internet — the whole point
Model not pulled when outage starts loadModels pre-pulls the coder models onto mokou’s /data/ollama
Tailscale control plane down mokou listens on 0.0.0.0; fall back to OLLAMA_HOST=http://192.168.0.x:11434 nix develop
nix develop with no registry dev shell is pinned by flake.lock; llm + plugin are in the store after first build
VRAM pressure (vision model resident) qwen2.5-coder:3b fallback; Ollama auto-unloads idle models
mokou down entirely accepted single point of failure — no automatic failover. Point OLLAMA_HOST at another Ollama host if one is ever stood up

See also#

  • Ollama on mokou — the inference server and the SM 6.1 build
  • agents — scripted batch tools sharing the same Ollama backend
  • Kiwix — the offline content server on voile