Paperless-NGX / Document Pipeline#

Paperless-NGX runs on ereshkigal as a Podman compose stack managed by the tsunaminoai.docPipeline NixOS module (modules/nixos/containers/doc-pipeline/default.nix).

Note

When tsunaminoai.pki.acme.enable is true, the module also stands up an nginx vhost on paperlessPort + 1 (8012) that reverse-proxies the web UI over TLS using the host’s ACME cert (https://ereshkigal.<tailscaleDomain>:8012). Paperless’ PAPERLESS_URL / PAPERLESS_CSRF_TRUSTED_ORIGINS are set to that HTTPS origin so the CSRF check passes behind the proxy.

Architecture#

```mermaid flowchart TD subgraph voile[“voile (Synology DSM) — 10.0.0.2”] nfs[“/volume2/Books”] nfs –> consume[“paperless-consume/\n(drop PDFs here for auto-ingest)”] nfs –> export[“paperless-export/\n(archival copies)”] end

subgraph mokou["mokou (GTX 1080)"]
    ollama["Ollama :11434\nqwen2.5vl:7b"]
end

subgraph ereshkigal["ereshkigal — 10.0.0.1 / 192.168.0.20"]
    nfs_mount["/mnt/voile/documents\n(NFSv4.1 over vmbr1 10G)"]
    subgraph paperless["paperless_default Podman network"]
        broker["paperless-broker\nvalkey/valkey:9-alpine :6379"]
        db["paperless-db\npostgres:17-alpine"]
        web["paperless-web\npaperless-ngx :8011"]
    end
    nfs_mount --> web
    broker --> web
    db --> web
end

consume --> nfs_mount
web --> export
web -->|"AI tagging / LLM\nhttp://mokou.ts.net:11434"| ollama

```

NFS uses the dedicated 10G backhaul (vmbr1 bridge, 10.0.0.0/24, MTU 9000) between ereshkigal and voile, not the regular LAN.

Enabling the module#

# hosts/x86_64-nixos/ereshkigal/default.nix
tsunaminoai.docPipeline = {
  enable = true;
  ollamaHost = "mokou.${config.tsunaminoai.nix.tailscaleDomain}";
  ollamaModel = "qwen2.5vl:7b";
  paperlessPort = 8011;
  # voileSharePath and timezone use correct defaults
};

Module options#

Option Default Description
enable false Enable the pipeline
paperlessPort 8011 Host port for Paperless-NGX web UI
ollamaHost "localhost" Hostname/IP of the Ollama inference server
ollamaModel "qwen2.5vl:7b" Ollama model used for VLM document tagging
voileSharePath "/volume2/Books" NFS export path on voile (Synology DSM)
timezone "America/Indiana/Indianapolis" Timezone for Paperless-NGX

The module also declares two sidecar option sub-trees, both enabled by default: paperlessGpt.* (LLM auto-tagging sidecar — manualTag, autoTag, port 8013, llmModel "qwen2.5:3b", tokenLimit "4000", contextLength "8192") and anythingLlm.* (RAG chat — port 13001, embeddingModel "nomic-embed-text"). See AI → paperless-gpt for the tagging sidecar.

NFS mount#

The module mounts voile:/volume2/Books at /mnt/voile/documents via NFSv4.1 with x-systemd.automount and a 10-minute idle timeout. The systemd automount unit is mnt-voile-documents.mount.

The paperless-web container depends on this unit, so it will not start until the NFS share is available. The mount itself waits on network-bonds-ready.service, which polls for bond0/bond1 readiness on ereshkigal.

Two subdirectories on the share are bind-mounted into the container:

Share path Container path Purpose
paperless-consume/ /usr/src/paperless/consume Drop PDFs here for auto-ingest
paperless-export/ /usr/src/paperless/export Archival copies written back by Paperless

Ollama / LLM integration#

LLM tagging and titling is handled by the paperless-gpt sidecar, not by Paperless-NGX itself (the native PAPERLESS_AI_* vars are unreleased as of v2.14). See AI → paperless-gpt for full details.

OCR runs on every ingested document (PAPERLESS_OCR_MODE=redo) so tesseract always extracts a text layer. The paperless-web container has --add-host=host.containers.internal:host-gateway so it can reach mokou via the host’s routing table (Tailscale). Use mokou’s Tailscale FQDN for ollamaHost so the address stays stable regardless of DHCP assignment.

Podman stack#

The core Paperless containers (paperless-broker, paperless-db, paperless-web) share the paperless_default Podman network and are wired into a single podman-compose-paperless-root.target. The target is wanted by multi-user.target. With the default sub-flags (paperlessGpt.enable and anythingLlm.enable both default to true), the paperless-gpt and anythingllm sidecars also join the same network and target — five containers in total.

Persistent data lives in named Podman volumes (created by one-shot systemd services):

Volume Contents
paperless_redis Valkey queue state
paperless_pgdata PostgreSQL database
paperless_data Paperless index, settings, ML models
paperless_media Scanned document files
anythingllm_storage AnythingLLM RAG / LanceDB storage (only when anythingLlm.enable, the default)

Backups#

The module hooks into the existing borgmatic.configurations.voile job (from tsunaminoai.borg). Before each borg backup, three pre-hooks run:

  1. Postgres dumppg_dump inside paperless-db, written to /var/backup/paperless/paperless-db-YYYYMMDD.sql. Dumps older than 3 days are pruned automatically.
  2. paperless_media snapshot — contents copied to /var/backup/paperless/data/media/ via a temporary busybox container.
  3. paperless_data snapshot — contents copied to /var/backup/paperless/data/appdata/ via a temporary busybox container.

/var/backup/paperless is already included in borgmatic’s source_directories (via the borg module’s /var/backup entry), so no additional path registration is needed.

Note

The busybox snapshots pull docker.io/busybox if not cached. Pre-pull it after first deploy: podman pull docker.io/busybox