Local `llm` CLI — maintaining the flake during an AI outage#

This flake is maintained with help from large language models. If a major cloud-AI outage takes Claude/ChatGPT offline, that workflow shouldn’t stop — so Simon Willison’s llm CLI is wired to the local Ollama on mokou. No cloud keys, no internet: maintenance keeps working entirely on hardware already in this flake.

It complements the agents package (narrow, scripted batch tools) with an interactive, conversation-logging assistant — and kiwix-ask lets the model ground answers in voile’s offline Kiwix archive.

Quick start#

nix develop                                   # llm + kiwix-ask are in the dev shell
llm "rewrite this NixOS option to use mkOption with a default"
llm -t nix-maint "how do I add an opt-in service module in this repo?"
llm-maint "how do I gate a service on a new option?"   # ↑ but with live repo facts injected
nix run .#llm -- -t nix-maint "…"             # without entering the dev shell
kiwix-ask -b stackexchange "fourier transform basics"

The first nix develop after a fresh checkout builds the llm + llm-ollama Python environment once (cached afterwards). It is pinned by flake.lock, so it keeps working with no network.

How it’s wired#

Concern	Choice
Package	`pkgs.python3Packages.llm.withPlugins { llm-ollama = true; }` (no top-level `pkgs.llm` in 26.05)
Backend	`llm-ollama` plugin — auto-discovers whatever models mokou has loaded
Endpoint	`OLLAMA_HOST` defaults to `http://mokou.armadillo-banfish.ts.net:11434`
Default model	`qwen2.5-coder:7b` (~4.7 GB Q4, fits the GTX 1080’s 8 GB), `qwen2.5-coder:3b` fallback
State dir	`LLM_USER_PATH` → `$PWD/.llm` (dev shell, git-ignored) or `$XDG_STATE_HOME/llm` (app)
Placement	dev shell (`modules/flake/programs/shell.nix`) + `nix run .#llm` / `.#kiwix-ask` apps

OLLAMA_HOST is the single override knob and is shared with the agents tools — set it once to retarget both. The coder models are pre-pulled by mokou’s services.ollama.loadModels (see Ollama) so they’re available offline.

The `nix-maint` template#

modules/flake/programs/nix-maint.yaml is an llm template whose system prompt encodes this repo’s conventions — flake-parts layout, the tsunaminoai.* opt-in module shape, alejandra + keep-sorted formatting, the /run/secrets (never /var/run) sops rule, deploy-rs, and Tailscale-FQDN addressing — so even a 7B local model produces patches that tend to pass treefmt and nix flake check.

llm -t nix-maint "add a boolean option tsunaminoai.foo.enable and gate a systemd service on it"

The dev shell installs the template into LLM_USER_PATH/templates/ and sets the default model on entry; nothing to configure by hand.

`llm-maint` — current facts, auto-injected#

A small local model is frozen at its training cutoff (~2023), so it doesn’t know this flake’s actual nixpkgs pin, module/option layout, or hosts. llm-maint wraps llm -t nix-maint and injects that ground truth as context fragments on every call, so the model reasons over the live repo instead of stale memory:

nixpkgs rev / ref / date from flake.lock (the version actually in use),
the live tsunaminoai.* option namespaces (grepped from module sources),
the NixOS / home-manager module and host lists,
the formatting / secrets / deploy / Tailscale conventions,
plus any files you pass with -f — typically the module you’re editing.

llm-maint "how do I gate a systemd service on a new tsunaminoai option?"
llm-maint -f modules/nixos/samba/default.nix "why doesn't the share show up?"

Flag / env	Meaning
`-f, --fragment PATH`	also inject this file (repeatable)
`-m, --model` / `-c, --num-ctx`	model id / Ollama context window (default 16384)
`LLM_MODEL` · `LLM_NUM_CTX` · `OLLAMA_HOST`	env overrides

It deliberately does not read options.md — that file is generated out-of-band and goes stale; module sources and flake.lock are always current.

Why inject instead of tool-call?

The stack can tool-call (llm --functions / llm tools, and llm-ollama registers tools), and qwen2.5-coder:7b does invoke tools over Ollama. But a 7B model is only so reliable at deciding to call them. Injecting the facts directly is deterministic — the model always sees current state. Reach for tools when the needed fact isn’t known ahead of time.

`kiwix-ask` — grounded retrieval from voile’s Kiwix#

kiwix-ask searches voile’s offline Kiwix archive, pulls the top article(s), strips them to text, and pipes them into llm as grounding context. During an outage the model can cite offline Wikipedia / DevDocs / Stack Exchange content instead of guessing.

kiwix-ask "how does the Linux page cache work"          # search all books
kiwix-ask -b stackexchange "windowing in an FFT"        # restrict to a book slug
kiwix-ask -b devdocs -q "what does @import do?" "zig import"
kiwix-ask --raw "water filtration"                      # print retrieved context only
kiwix-ask --url /content/<slug>/A/Some_Article "summarise"   # skip search

Flag / env	Meaning
`-b, --book NAME`	restrict to books whose slug contains `NAME` (e.g. `wikibooks`, `stackexchange`)
`-n, --num N`	number of articles to retrieve (default 3)
`-m, --model` / `-q, --question`	model id / question (default: the query itself)
`--url PATH`	fetch a `/content/<slug>/…` path directly, skip search
`--raw`	print retrieved context without calling the model
`KIWIX_HOST`	Kiwix base URL (default `https://voile.armadillo-banfish.ts.net:8092`)
`KIWIX_INSECURE=1` / `KIWIX_CACERT`	TLS escape hatches for clients that don’t trust the internal CA

How it works. Kiwix book “slugs” carry volatile date suffixes (e.g. wikibooks_en_all_maxi_2021-03) and full-text search is per-book, so kiwix-ask resolves the current slugs live from /catalog/v2/entries, searches each book until it has enough hits, then fetches /content/<slug>/<path> and converts the HTML to text (pkgs/kiwix-ask/html-to-text.py). Nothing dated is hardcoded. NixOS hosts trust the internal step-ca CA, so HTTPS to voile works without -k.

Outage resilience#

Failure	Behaviour
Cloud AI APIs down	Local path needs no cloud keys and no internet — the whole point
Model not pulled when outage starts	`loadModels` pre-pulls the coder models onto mokou’s `/data/ollama`
Tailscale control plane down	mokou listens on `0.0.0.0`; fall back to `OLLAMA_HOST=http://192.168.0.x:11434 nix develop`
`nix develop` with no registry	dev shell is pinned by `flake.lock`; `llm` + plugin are in the store after first build
VRAM pressure (vision model resident)	`qwen2.5-coder:3b` fallback; Ollama auto-unloads idle models
mokou down entirely	accepted single point of failure — no automatic failover. Point `OLLAMA_HOST` at another Ollama host if one is ever stood up

Local llm CLI — maintaining the flake during an AI outage#

Quick start#

How it’s wired#

The nix-maint template#

llm-maint — current facts, auto-injected#

kiwix-ask — grounded retrieval from voile’s Kiwix#

Outage resilience#

See also#

Local `llm` CLI — maintaining the flake during an AI outage#

The `nix-maint` template#

`llm-maint` — current facts, auto-injected#

`kiwix-ask` — grounded retrieval from voile’s Kiwix#