Local llm CLI — maintaining the flake during an AI outage#
This flake is maintained with help from large language models. If a major cloud-AI
outage takes Claude/ChatGPT offline, that workflow shouldn’t stop — so
Simon Willison’s llm CLI is wired to the local
Ollama on mokou. No cloud keys, no internet: maintenance keeps working entirely
on hardware already in this flake.
It complements the agents package (narrow, scripted batch tools) with an
interactive, conversation-logging assistant — and kiwix-ask lets the model ground
answers in voile’s offline Kiwix archive.
Quick start#
nix develop # llm + kiwix-ask are in the dev shell
llm "rewrite this NixOS option to use mkOption with a default"
llm -t nix-maint "how do I add an opt-in service module in this repo?"
llm-maint "how do I gate a service on a new option?" # ↑ but with live repo facts injected
nix run .#llm -- -t nix-maint "…" # without entering the dev shell
kiwix-ask -b stackexchange "fourier transform basics"
The first nix develop after a fresh checkout builds the llm + llm-ollama
Python environment once (cached afterwards). It is pinned by flake.lock, so it
keeps working with no network.
How it’s wired#
| Concern | Choice |
|---|---|
| Package | pkgs.python3Packages.llm.withPlugins { llm-ollama = true; } (no top-level pkgs.llm in 26.05) |
| Backend | llm-ollama plugin — auto-discovers whatever models mokou has loaded |
| Endpoint | OLLAMA_HOST defaults to http://mokou.armadillo-banfish.ts.net:11434 |
| Default model | qwen2.5-coder:7b (~4.7 GB Q4, fits the GTX 1080’s 8 GB), qwen2.5-coder:3b fallback |
| State dir | LLM_USER_PATH → $PWD/.llm (dev shell, git-ignored) or $XDG_STATE_HOME/llm (app) |
| Placement | dev shell (modules/flake/programs/shell.nix) + nix run .#llm / .#kiwix-ask apps |
OLLAMA_HOST is the single override knob and is shared with the agents tools — set
it once to retarget both. The coder models are pre-pulled by mokou’s
services.ollama.loadModels (see Ollama) so they’re available offline.
The nix-maint template#
modules/flake/programs/nix-maint.yaml
is an llm template whose system prompt encodes this repo’s conventions — flake-parts
layout, the tsunaminoai.* opt-in module shape, alejandra + keep-sorted formatting, the
/run/secrets (never /var/run) sops rule, deploy-rs, and Tailscale-FQDN addressing —
so even a 7B local model produces patches that tend to pass treefmt and
nix flake check.
llm -t nix-maint "add a boolean option tsunaminoai.foo.enable and gate a systemd service on it"
The dev shell installs the template into LLM_USER_PATH/templates/ and sets the default
model on entry; nothing to configure by hand.
llm-maint — current facts, auto-injected#
A small local model is frozen at its training cutoff (~2023), so it doesn’t know this
flake’s actual nixpkgs pin, module/option layout, or hosts. llm-maint wraps
llm -t nix-maint and injects that ground truth as context fragments on every call,
so the model reasons over the live repo instead of stale memory:
- nixpkgs rev / ref / date from
flake.lock(the version actually in use), - the live
tsunaminoai.*option namespaces (grepped from module sources), - the NixOS / home-manager module and host lists,
- the formatting / secrets / deploy / Tailscale conventions,
- plus any files you pass with
-f— typically the module you’re editing.
llm-maint "how do I gate a systemd service on a new tsunaminoai option?"
llm-maint -f modules/nixos/samba/default.nix "why doesn't the share show up?"
| Flag / env | Meaning |
|---|---|
-f, --fragment PATH |
also inject this file (repeatable) |
-m, --model / -c, --num-ctx |
model id / Ollama context window (default 16384) |
LLM_MODEL · LLM_NUM_CTX · OLLAMA_HOST |
env overrides |
It deliberately does not read options.md — that file is generated
out-of-band and goes stale; module sources and flake.lock are always current.
Why inject instead of tool-call?
The stack can tool-call (llm --functions / llm tools, and llm-ollama
registers tools), and qwen2.5-coder:7b does invoke tools over Ollama. But a 7B
model is only so reliable at deciding to call them. Injecting the facts directly
is deterministic — the model always sees current state. Reach for tools when the
needed fact isn’t known ahead of time.
kiwix-ask — grounded retrieval from voile’s Kiwix#
kiwix-ask searches voile’s offline Kiwix archive, pulls the
top article(s), strips them to text, and pipes them into llm as grounding context.
During an outage the model can cite offline Wikipedia / DevDocs / Stack Exchange
content instead of guessing.
kiwix-ask "how does the Linux page cache work" # search all books
kiwix-ask -b stackexchange "windowing in an FFT" # restrict to a book slug
kiwix-ask -b devdocs -q "what does @import do?" "zig import"
kiwix-ask --raw "water filtration" # print retrieved context only
kiwix-ask --url /content/<slug>/A/Some_Article "summarise" # skip search
| Flag / env | Meaning |
|---|---|
-b, --book NAME |
restrict to books whose slug contains NAME (e.g. wikibooks, stackexchange) |
-n, --num N |
number of articles to retrieve (default 3) |
-m, --model / -q, --question |
model id / question (default: the query itself) |
--url PATH |
fetch a /content/<slug>/… path directly, skip search |
--raw |
print retrieved context without calling the model |
KIWIX_HOST |
Kiwix base URL (default https://voile.armadillo-banfish.ts.net:8092) |
KIWIX_INSECURE=1 / KIWIX_CACERT |
TLS escape hatches for clients that don’t trust the internal CA |
How it works. Kiwix book “slugs” carry volatile date suffixes (e.g.
wikibooks_en_all_maxi_2021-03) and full-text search is per-book, so kiwix-ask
resolves the current slugs live from /catalog/v2/entries, searches each book until it
has enough hits, then fetches /content/<slug>/<path> and converts the HTML to text
(pkgs/kiwix-ask/html-to-text.py).
Nothing dated is hardcoded. NixOS hosts trust the internal step-ca CA, so HTTPS to voile
works without -k.
Outage resilience#
| Failure | Behaviour |
|---|---|
| Cloud AI APIs down | Local path needs no cloud keys and no internet — the whole point |
| Model not pulled when outage starts | loadModels pre-pulls the coder models onto mokou’s /data/ollama |
| Tailscale control plane down | mokou listens on 0.0.0.0; fall back to OLLAMA_HOST=http://192.168.0.x:11434 nix develop |
nix develop with no registry |
dev shell is pinned by flake.lock; llm + plugin are in the store after first build |
| VRAM pressure (vision model resident) | qwen2.5-coder:3b fallback; Ollama auto-unloads idle models |
| mokou down entirely | accepted single point of failure — no automatic failover. Point OLLAMA_HOST at another Ollama host if one is ever stood up |
See also#
- Ollama on mokou — the inference server and the SM 6.1 build
- agents — scripted batch tools sharing the same Ollama backend
- Kiwix — the offline content server on voile