Fast Facts#

System Summary#

  • Hardware: Dell R720xd
  • CPUs: 2× Intel Xeon E5-2650 v2 (32 threads)
  • RAM: 168GB DDR3
  • Storage:
  • Boot: 2×250GB SSD (RAID1)
  • Data: 4×3TB HDD (RAID5 + hot spare)
  • GPUs:
  • NVIDIA Quadro P400
  • 2× NVIDIA Tesla K80 (Kepler GK210, CUDA 3.7 — host-bound, currently disabled)
  • Networking:
  • 4×1GbE (Bonded as bond1vmbr0)
  • 2×10GbE (Bonded as bond0vmbr1)
  • Management: iDRAC at 192.168.0.21

The host config lives at hosts/x86_64-nixos/ereshkigal/default.nix.


GPUs#

The NVIDIA stack is driven on-host (no VFIO passthrough; there are no microvm guests defined — microvm.autostart = []). NVIDIA support is enabled with the legacy_470 driver (tsunaminoai.nvidia.package = config.boot.kernelPackages.nvidiaPackages.legacy_470).

The Tesla K80 is Kepler-generation (GK210, compute capability 3.7) and is incompatible with CUDA 12, so it is kept disabled for CUDA work. hardware.nvidia-container-toolkit.enable = false. The K80 does not support MIG/SR-IOV partitioning (that is an Ampere-era feature).


Verification Steps#

  1. GPU Functionality Test

    nvidia-smi  # Should show the installed GPUs
    

  2. Network Bonding Verification

    cat /proc/net/bonding/bond0  # Check 10GbE bond (enp70s0f0/f1 → vmbr1)
    cat /proc/net/bonding/bond1  # Check 1GbE bond (eno1..eno4 → vmbr0)
    

  3. Network Throughput Test

    iperf3 -c 10.0.0.2 -P 8  # Test 10GbE backbone to the voile peer
    

  4. Storage Health Check

    /opt/MegaRAID/perccli/perccli64 /c0 show all  # Verify RAID status
    btrfs scrub start /nix  # Check the btrfs filesystem
    

  5. Boot Time Benchmark

    systemd-analyze blame  # Identify slow services
    

Note: After changes, rebuild with nixos-rebuild test before applying permanently. Monitor dmesg for hardware errors during boot.


Services#

Ereshkigal is the primary services host on the gensokyo LAN. The roles enabled in its config (tsunaminoai.* unless otherwise noted) are:

Service Role Endpoint / Notes
step-ca Internal PKI / ACME CA (pki.acme.enable = true) Issues certs to LAN hosts via ACME
Paperless-NGX Document pipeline (docPipeline.enable = true) Web UI on :8011 (HTTPS :8012); OCR + auto-tagging, inference via Ollama on mokou over Tailscale
Open-WebUI LLM chat + RAG over Paperless docs (openWebui.enable = true) HTTP :3000, HTTPS :3001; talks to Ollama on mokou
Media / servarr media.server.video/audio + servarr.enable = true Jellyfin, Sonarr, Radarr, Prowlarr, Lidarr, Readarr, Whisparr, qBittorrent, Stash; Tdarr server
Homer Dashboard (homer.enable = true) :88
nix-serve Binary cache (nix.isCache = true) :11111 (also a deploy node, nix.isDeployNode = true)
Ollama Local inference (services.ollama) :11434, CUDA-accelerated; loads nomic-embed-text for Open-WebUI RAG
ESPHome IoT device firmware/dashboard (esphome.enable = true) IoT VLAN gateway 192.168.0.1
Deploy dashboard Fleet health (deploy.enableDashboard = true) :8420, healthCheckInterval = "5min"
dell-idrac-fan-controller Managed fan curve (services.dell-idrac-fan-controller) fanSpeed = 10, iDRAC local, password from sops dell-fan

Fleet rebuild monitoring is handled by the deploy dashboard above (tsunaminoai.deploy.enableDashboard = true, dashboardPort = 8420, healthCheckInterval = "5min"), which watches the monitoredHosts list (ereshkigal, mokou, razer, shinobu, bedford-drdillo-mbair, work-laptop).

Note: Fan management is already handled by services.dell-idrac-fan-controller; do not also set manual GPU fan curves in iDRAC.


Description#

Ereshkigal is a Dell R720xd with the following specs:

Component Spec Notes
CPU 2x Intel(R) Xeon(R) CPU E5-2650 v2 @ 3.40 GHz 32 vcores
RAM 168 GB
Storage 2x 250GB SSD RAID 1 RootFS
Storage 4x 3TB RAID 5 (1 hot spare) Kur storage pool

Networking#

There are two network devices available on ereshkigal: a 4x1gbps integrated card and a 2x10gbps-T card in riser 1.

iDrac#

The iDrac is a hardware management system of Ereshkigal where power, remote console, and RAID can be configured independent of the power status of the main server. iDrac Access (use Safari)

Storage#

Storage can be initially managed through iDrac under storage. However, while the system is on, any management on running must be managed in the host OS. For that reason, if a disk for the RootOS (RAID-1) or the proxmox storage pool (RAID-5) must be replaced, do the following:

  • Identify the failing drive using syslog, SMART, iDrac identify “blink”, etc
  • Procure a new disk that is the same size as other member disks
  • Remove the identified disk from the machine
  • Insert the replacement disk
  • Verify the disk shows in iDrac under storage → Physical Disks
  • Log into ereshkigal using ssh
  • Run:
    • /opt/MegaRAID/perccli/perccli64 /c0/sX add hotsparedrive where X is the disk slot number to add a global hot spare
    • or /opt/MegaRAID/perccli/perccli64 /c0/sX add hotsparedrive dgs=Y where Y is the disk group (VD) number to add a dedicated hotspare
  • Check that the rebuild is running using /opt/MegaRAID/perccli/perccli64 /c0/sX show rebuild
  • When the rebuild completes, verify that the RAID is back in healthy status

GPU Inventory#

The host carries an NVIDIA Quadro P400 plus two Tesla K80s. As of the current config these are driven on-host with the legacy_470 driver and are not passed through to any VM (no microvm guests are defined). The K80s remain disabled for CUDA work — see the GPUs note above.

Relevant lspci lines

04:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P400] (rev a1)
04:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
44:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
45:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

Dell iDRAC SSH Shell Command Cheatsheet#

This cheatsheet summarizes the core commands available in the iDRAC shell (CLP/SMCLP) when connected via SSH on a Dell PowerEdge R720xd. These commands are distinct from racadm but provide similar management functionality directly in the shell.


Core Shell Commands#

Command Description & Usage Example
show Display information about system objects or properties.Usage: show []
set Set a property value.Usage: set [] =
cd Change current directory/object context.Usage: cd []
create Create a new object.Usage: create [=] ...
delete Delete an object.Usage: delete
exit Exit the shell session.Usage: exit
reset Reset a target (e.g., power cycle server or controller).Usage: reset []
start Start a target (e.g., power on server/component).Usage: start []
stop Stop a target (e.g., power off server/component).Usage: stop []
version Show shell and firmware version info.Usage: version
help Show help for commands or topics.Usage: help []
load Load configuration from a URI.Usage: load -source []
dump Dump configuration to a URI.Usage: dump -destination []

Usage Examples#

  • Show all system info:
    show /
    
  • Show a specific property:
    show /system1
    
  • Set a property:
    set /system1 enabled=true
    
  • Change directory/context:
    cd /system1
    
  • Reset the server:
    reset /system1
    
  • Power on the server:
    start /system1
    
  • Power off the server:
    stop /system1
    
  • Create a new user (example, if supported):
    create /user1 username=admin password=secret
    
  • Delete a user (example):
    delete /user1
    
  • Load config from a file:
    load -source tftp://192.168.1.10/config.xml /system1
    
  • Dump config to a file:
    dump -destination tftp://192.168.1.10/backup.xml /system1
    
  • Get shell version:
    version
    
  • Get help on a command:
    help set
    

Command Structure#

  • Targets are objects in the system hierarchy (e.g., /system1, /chassis1, /user1).
  • Properties are attributes of those targets (e.g., enabled, username).
  • Options may modify command behavior (e.g., -source, -destination).

Notes#

  • The shell is case-sensitive.
  • Use cd / to return to the root context.
  • Use show without arguments for a list of objects in the current context.
  • For full command syntax and target/property names, use help or show at each level.

This cheatsheet covers the main command set available in the iDRAC shell via SSH on Dell PowerEdge servers like the R720xd, based on the /admin1-> help output and standard SMCLP conventions. For advanced features, consult the iDRAC CLI Reference Guide or use help within the shell for context-sensitive assistance.

[1] https://www.dell.com/support/contents/en-us/videos/videoplayer/tutorial-on-idrac-racadm-command-line/1706695616981987241 [2] https://www.gooksu.com/2015/04/racadm-quick-dirty-cheatsheet/ [3] https://dl.dell.com/topicspdf/idrac7-8-lifecycle-controller-v2505050_reference-guide_en-us.pdf [4] https://github.com/spyroot/idrac_ctl [5] https://christitus.com/idrac-dell-server/ [6] https://dl.dell.com/topicspdf/idrac7-8-lifecycle-controller-v2404040_reference-guide_en-us.pdf [7] https://dl.dell.com/topicspdf/idrac7-8-lifecycle-controller-v2.30.30.30_reference-guide4_en-us.pdf [8] https://www.reddit.com/r/homelab/comments/a49b4y/r720_idrac_help/ [9] https://www.dell.com/support/manuals/en-us/idrac9-lifecycle-controller-v5.x-series/idrac9_5.00.00.00_ug/sol-using-ssh?guid=guid-36278d42-c759-42fd-8320-71aa9a262e7f&lang=en-us [10] https://www.reddit.com/r/sysadmin/comments/ewrj20/bash_script_to_scan_dell_idrac9_and_execute/