Skip to content

Ereshkigal

Fast Facts#

Based on your Dell R720xd configuration running NixOS, here’s a cleaned-up overview with performance optimization tasks and peripheral configuration steps:

System Summary#

  • Hardware: Dell R720xd
  • CPUs: 2× Intel Xeon E5-2650 v2 (32 threads)
  • RAM: 168GB DDR3
  • Storage:
  • Boot: 2×250GB SSD (RAID1)
  • Data: 4×3TB HDD (RAID5 + hot spare)
  • GPUs:
  • NVIDIA Quadro P400 (Passthrough)
  • 2× NVIDIA Tesla K80 (Passthrough)
  • Networking:
  • 4×1GbE (Bonded as bond1vmbr0)
  • 2×10GbE (Bonded as bond0vmbr1)
  • Management: iDRAC at 192.168.0.21

Performance Optimization Tasks#

  1. GPU SR-IOV Partitioning (Tesla K80)
    Split GPUs for multiple VMs:
    nvidia-smi -i 0 -mig 1  # Enable MIG mode
    nvidia-smi mig -cgi 19g.40gb -C  # Create compute instances
    

Peripheral Configuration Tasks#

  1. Network Bonding Verification
    Confirm bond status:

    cat /proc/net/bonding/bond0  # Check 10GbE bond
    cat /proc/net/bonding/bond1  # Check 1GbE bond
    

  2. RAID Health Monitoring
    Add PERC controller checks:

    services.smartd = {
      enable = true;
      devices = [{ device = "/dev/sda"; options = "-a -d megaraid,0"; }];
    };
    

  3. iDRAC Alert Integration
    Configure SNMP traps for hardware events:

    racadm set iDRAC.SNMP.Alert 1
    racadm set iDRAC.SNMP.AgentEnable 1
    

  4. USB Device Passthrough
    For peripherals (e.g., security dongles):

    boot.kernelParams = [ "usbcore.quirks=0x1234:0x5678:0x044" ];
    


Configuration Cleanup#

  1. Remove Unused Bonds
    Delete unused bond declarations:

    networking.bonds.bond1 = lib.mkForce {};  # If not used
    

  2. Simplify Network Setup
    Replace manual IP with networkd:

    systemd.network.enable = true;
    networking.useNetworkd = true;
    

  3. Fix VLAN Configuration
    Uncomment and repair VLAN setup:

    networking.vlans = {
      vlan10 = { id = 10; interface = "vmbr0"; };
    };
    

  4. MicroVM Optimization
    Enable virtiofs for faster VM storage:

    microvm.shares = [{
      source = "/nix/store";
      mountPoint = "/nix/.ro-store";
      tag = "ro-store";
      proto = "virtiofs";
    }];
    


Verification Steps#

  1. GPU Functionality Test

    nvidia-smi  # Should show all GPUs
    lspci -vnn -d 10de:  # Check passthrough devices
    

  2. Network Throughput Test

    iperf3 -c 192.168.0.1 -P 8  # Test 10GbE bond
    

  3. Storage Health Check

    perccli64 /c0 show all  # Verify RAID status
    btrfs scrub start /nix  # Check filesystem
    

  4. Boot Time Benchmark

    systemd-analyze blame  # Identify slow services
    

Note: After changes, rebuild with nixos-rebuild test before applying permanently. Monitor dmesg for hardware errors during boot.

  • Update iDRAC firmware via racadm update -f idrac.fwimg
  • Configure GPU fan curves in iDRAC to prevent thermal throttling
  • Set up NixOS rebuild monitoring with services.healthcheck.enable = true

[1] https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/18980759/c5e3f0cc-c128-453d-9717-0cb40065afd4/paste.txt

Description#

Ereshkigal is a Dell R720xd with the following specs:

Component Spec Notes
CPU 2x Intel(R) Xeon(R) CPU E5-2650 v2 @ 3.40 GHz 32 vcores
RAM 168 GB
Storage 2x 250GB SSD RAID 1 RootFS
Storage 4x 3TB RAID 5 (1 hot spare) Kur storage pool

Networking#

There are two network devices available on ereshkigal: a 4x1gbps integrated card and a 2x10gbps-T card in riser 1.

iDrac#

The iDrac is a hardware management system of Ereshkigal where power, remote console, and RAID can be configured independent of the power status of the main server. iDrac Access (use Safari)

Storage#

Storage can be initially managed through iDrac under storage. However, while the system is on, any management on running must be managed in the host OS. For that reason, if a disk for the RootOS (RAID-1) or the proxmox storage pool (RAID-5) must be replaced, do the following:

  • Identify the failing drive using syslog, SMART, iDrac identify “blink”, etc
  • Procure a new disk that is the same size as other member disks
  • Remove the identified disk from the machine
  • Insert the replacement disk
  • Verify the disk shows in iDrac under storage → Physical Disks
  • Log into ereshkigal using ssh
  • Run:
    • /opt/MegaRAID/perccli/perccli64 /c0/sX add hotsparedrive where X is the disk slot number to add a global hot spare
    • or /opt/MegaRAID/perccli/perccli64 /c0/sX add hotsparedrive dgs=Y where Y is the disk group (VD) number to add a dedicated hotspare
  • Check that the rebuild is running using /opt/MegaRAID/perccli/perccli64 /c0/sX show rebuild
  • When the rebuild completes, verify that the RAID is back in healthy status

VMs Hosted on Ereshkigal#

PCI Passthrough#

An Nvidia Quadro P400 is available via hostpci0: 04:00,pcie=1,rombar=0,driver=vfio

An Nvidia Tesla K80 is available via

hostpci0: 44:00,pcie=1
hostpci1: 45:00,pcie=1

Relevant lspci lines

04:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P400] (rev a1)
04:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
44:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
45:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

Read about passthrough here

Dell iDRAC SSH Shell Command Cheatsheet#

This cheatsheet summarizes the core commands available in the iDRAC shell (CLP/SMCLP) when connected via SSH on a Dell PowerEdge R720xd. These commands are distinct from racadm but provide similar management functionality directly in the shell.


Core Shell Commands#

Command Description & Usage Example
show Display information about system objects or properties.Usage: show []
set Set a property value.Usage: set [] =
cd Change current directory/object context.Usage: cd []
create Create a new object.Usage: create [=] ...
delete Delete an object.Usage: delete
exit Exit the shell session.Usage: exit
reset Reset a target (e.g., power cycle server or controller).Usage: reset []
start Start a target (e.g., power on server/component).Usage: start []
stop Stop a target (e.g., power off server/component).Usage: stop []
version Show shell and firmware version info.Usage: version
help Show help for commands or topics.Usage: help []
load Load configuration from a URI.Usage: load -source []
dump Dump configuration to a URI.Usage: dump -destination []

Usage Examples#

  • Show all system info:
    show /
    
  • Show a specific property:
    show /system1
    
  • Set a property:
    set /system1 enabled=true
    
  • Change directory/context:
    cd /system1
    
  • Reset the server:
    reset /system1
    
  • Power on the server:
    start /system1
    
  • Power off the server:
    stop /system1
    
  • Create a new user (example, if supported):
    create /user1 username=admin password=secret
    
  • Delete a user (example):
    delete /user1
    
  • Load config from a file:
    load -source tftp://192.168.1.10/config.xml /system1
    
  • Dump config to a file:
    dump -destination tftp://192.168.1.10/backup.xml /system1
    
  • Get shell version:
    version
    
  • Get help on a command:
    help set
    

Command Structure#

  • Targets are objects in the system hierarchy (e.g., /system1, /chassis1, /user1).
  • Properties are attributes of those targets (e.g., enabled, username).
  • Options may modify command behavior (e.g., -source, -destination).

Notes#

  • The shell is case-sensitive.
  • Use cd / to return to the root context.
  • Use show without arguments for a list of objects in the current context.
  • For full command syntax and target/property names, use help or show at each level.

This cheatsheet covers the main command set available in the iDRAC shell via SSH on Dell PowerEdge servers like the R720xd, based on the /admin1-> help output and standard SMCLP conventions. For advanced features, consult the iDRAC CLI Reference Guide or use help within the shell for context-sensitive assistance.

[1] https://www.dell.com/support/contents/en-us/videos/videoplayer/tutorial-on-idrac-racadm-command-line/1706695616981987241 [2] https://www.gooksu.com/2015/04/racadm-quick-dirty-cheatsheet/ [3] https://dl.dell.com/topicspdf/idrac7-8-lifecycle-controller-v2505050_reference-guide_en-us.pdf [4] https://github.com/spyroot/idrac_ctl [5] https://christitus.com/idrac-dell-server/ [6] https://dl.dell.com/topicspdf/idrac7-8-lifecycle-controller-v2404040_reference-guide_en-us.pdf [7] https://dl.dell.com/topicspdf/idrac7-8-lifecycle-controller-v2.30.30.30_reference-guide4_en-us.pdf [8] https://www.reddit.com/r/homelab/comments/a49b4y/r720_idrac_help/ [9] https://www.dell.com/support/manuals/en-us/idrac9-lifecycle-controller-v5.x-series/idrac9_5.00.00.00_ug/sol-using-ssh?guid=guid-36278d42-c759-42fd-8320-71aa9a262e7f&lang=en-us [10] https://www.reddit.com/r/sysadmin/comments/ewrj20/bash_script_to_scan_dell_idrac9_and_execute/