Paperless-NGX / Document Pipeline#

Paperless-NGX runs on ereshkigal as a Podman compose stack managed by the tsunaminoai.docPipeline NixOS module (modules/nixos/containers/doc-pipeline/default.nix).

Architecture#

```mermaid flowchart TD subgraph voile[“voile (Synology DSM) — 10.0.0.2”] nfs[“/volume2/Books”] nfs –> consume[“paperless-consume/\n(drop PDFs here for auto-ingest)”] nfs –> export[“paperless-export/\n(archival copies)”] end

subgraph mokou["mokou (GTX 1080)"]
    ollama["Ollama :11434\nqwen2.5-vl:7b"]
end

subgraph ereshkigal["ereshkigal — 10.0.0.1 / 192.168.0.20"]
    nfs_mount["/mnt/voile/documents\n(NFSv4.1 over vmbr1 10G)"]
    subgraph paperless["paperless_default Podman network"]
        broker["paperless-broker\nvalkey:9-alpine :6379"]
        db["paperless-db\npostgres:17-alpine"]
        web["paperless-web\npaperless-ngx :8011"]
    end
    nfs_mount --> web
    broker --> web
    db --> web
end

consume --> nfs_mount
web --> export
web -->|"AI tagging / LLM\nhttp://mokou.ts.net:11434"| ollama

```

NFS uses the dedicated 10G backhaul (vmbr1 bridge, 10.0.0.0/24, MTU 9000) between ereshkigal and voile, not the regular LAN.

Enabling the module#

# hosts/x86_64-nixos/ereshkigal/default.nix
tsunaminoai.docPipeline = {
  enable = true;
  ollamaHost = "mokou.${config.tsunaminoai.nix.tailscaleDomain}";
  ollamaModel = "qwen2.5-vl:7b";
  paperlessPort = 8011;
  # voileSharePath and timezone use correct defaults
};

Module options#

Option Default Description
enable false Enable the pipeline
paperlessPort 8011 Host port for Paperless-NGX web UI
ollamaHost "localhost" Hostname/IP of the Ollama inference server
ollamaModel "qwen2.5-vl:7b" Ollama model used for VLM document tagging
voileSharePath "/volume2/Books" NFS export path on voile (Synology DSM)
timezone "America/Indiana/Indianapolis" Timezone for Paperless-NGX

NFS mount#

The module mounts voile:/volume2/Books at /mnt/voile/documents via NFSv4.1 with x-systemd.automount and a 10-minute idle timeout. The systemd automount unit is mnt-voile-documents.mount.

The paperless-web container depends on this unit, so it will not start until the NFS share is available. The mount itself waits on network-bonds-ready.service, which polls for bond0/bond1 readiness on ereshkigal.

Two subdirectories on the share are bind-mounted into the container:

Share path Container path Purpose
paperless-consume/ /usr/src/paperless/consume Drop PDFs here for auto-ingest
paperless-export/ /usr/src/paperless/export Archival copies written back by Paperless

Ollama / LLM integration#

LLM tagging and titling is handled by the paperless-gpt sidecar, not by Paperless-NGX itself (the native PAPERLESS_AI_* vars are unreleased as of v2.14). See AI → paperless-gpt for full details.

OCR runs on every ingested document (PAPERLESS_OCR_MODE=redo) so tesseract always extracts a text layer. The paperless-web container has --add-host=host.containers.internal:host-gateway so it can reach mokou via the host’s routing table (Tailscale). Use mokou’s Tailscale FQDN for ollamaHost so the address stays stable regardless of DHCP assignment.

Podman stack#

All three containers share the paperless_default Podman network and are wired into a single podman-compose-paperless-root.target. The target is wanted by multi-user.target.

Persistent data lives in named Podman volumes (created by one-shot systemd services):

Volume Contents
paperless_redis Valkey queue state
paperless_pgdata PostgreSQL database
paperless_data Paperless index, settings, ML models
paperless_media Scanned document files

Backups#

The module hooks into the existing borgmatic.configurations.voile job (from tsunaminoai.borg). Before each borg backup, three pre-hooks run:

  1. Postgres dumppg_dump inside paperless-db, written to /var/backup/paperless/paperless-db-YYYYMMDD.sql. Dumps older than 3 days are pruned automatically.
  2. paperless_media snapshot — contents copied to /var/backup/paperless/data/media/ via a temporary busybox container.
  3. paperless_data snapshot — contents copied to /var/backup/paperless/data/appdata/ via a temporary busybox container.

/var/backup/paperless is already included in borgmatic’s source_directories (via the borg module’s /var/backup entry), so no additional path registration is needed.

Note

The busybox snapshots pull docker.io/busybox if not cached. Pre-pull it after first deploy: podman pull docker.io/busybox