Files
nix-dotfiles/docs/ADRs/0001-zfs-initrd-key-design.md
T

8.0 KiB

Note: This document was AI-generated and reviewed by a maintainer.

ADR 0001 — ZFS Native Encryption: Non-Interactive initrd Key Loading

Status Accepted
Date 2026-05-03
Deciders Alice Huston
Affects systems/palatine-hill/hardware-changes.nix, systems/palatine-hill/zfs.nix

Context

palatine-hill uses ZFS native encryption for the /nix dataset (ZFS-primary/nix). The ZFS encryption key was stored on a separate LVM volume (/crypto/keys/zfs-nix-store-key) inside the same LUKS container as root.

This created a forced ordering dependency: the /nix dataset could not be unlocked until root (/) and /crypto were both mounted, even though logically they are independent. Two custom initrd units worked around this:

  • zfs-import-zfs-primary — polling import loop (duplicates NixOS-native logic)
  • zfs-load-nix-key — reads key from /sysroot/crypto/keys/zfs-nix-store-key after sysroot.mount

Additionally, boot.zfs.requestEncryptionCredentials was forced off entirely, and a postBootCommands fallback ran zfs load-key -a after stage 2 as a belt-and-suspenders measure. LUKS unlock was also interactive, requiring manual passphrase entry at boot.

Current initrd dependency graph (before this ADR)

flowchart TD
    A([initrd start]) --> B[systemd-udev-settle]
    A --> C["LUKS unlock nixos-pv\n⚠ interactive"]

    C --> D[LVM activate]
    D --> E["sysroot.mount\n/ on ext4"]
    D --> F["sysroot-crypto.mount\n/crypto on LVM volume"]

    B --> G["zfs-import-zfs-primary\n(custom polling loop, 60s timeout)"]

    E --> H["zfs-load-nix-key\n(reads /sysroot/crypto/keys/zfs-nix-store-key)"]
    F --> H
    G --> H

    H --> I["sysroot-nix.mount\nZFS-primary/nix"]
    I --> J([initrd-fs.target])
    E --> J
    J --> K([stage 2])
    K --> L["postBootCommands:\nzfs load-key -a"]

Problems with the old approach

  1. Cross-filesystem key dependency: /nix unlock depends on root mount, coupling two logically independent operations.
  2. Duplicated pool import logic: the custom unit reimplements a polling loop that NixOS already generates natively; upstream fixes don't apply automatically.
  3. Native credential handling fully disabled: requestEncryptionCredentials = false makes the configuration opaque to NixOS module evaluation.
  4. Double key load: postBootCommands is a workaround indicating the initrd path is not reliable.
  5. Interactive LUKS unlock: manual passphrase entry required at every boot — defeats unattended operation.

Options Considered

Option A — Key embedded in initrd (boot.initrd.secrets)

Store the ZFS key directly inside the initrd cpio archive. The key is available from the very start of stage 1 without mounting anything.

Pro: Eliminates the cross-mount dependency; re-enables native NixOS ZFS handling; zero new infrastructure.

Con: Key lives in the initrd on /boot, which is an unencrypted vfat partition. Anyone with physical or boot-partition read access has the key. Does not solve interactive LUKS unlock.

Option B — Tang network key fetch (Clevis) Chosen

Encrypt both secrets (LUKS passphrase and ZFS key) as Clevis JWE blobs. At boot, the initrd reaches a Tang server on the LAN to decrypt them. NixOS's boot.initrd.clevis module natively supports luks, zfs, and bcachefsno custom unit is needed for ZFS.

Pro: Key never present on disk in plaintext; unified unlock surface for both LUKS and ZFS; no cross-mount dependency; JWE blobs on disk are useless without the Tang server.

Con: Adds Tang server as a boot dependency; server won't boot if Tang is unreachable.


Decision

Option B (Tang/Clevis) is adopted for both the LUKS root device and the ZFS /nix dataset.

boot.initrd.clevis.devices handles both unlock targets natively. The custom zfs-load-nix-key unit is deleted entirely. The zfs-import-zfs-primary unit is retained — the pool must still be imported before Clevis can load the dataset key.

Static networking is configured in the initrd using systemd-networkd with a static IP (192.168.76.2/24). DNS resolution (192.168.76.1, the OPNsense router running Unbound) allows the Tang URL to be http://tang.lan.

New initrd dependency graph

flowchart TD
    A([initrd start]) --> N["initrd-networkd\neno1: 192.168.76.2/24\nDNS: 192.168.76.1"]
    A --> B[systemd-udev-settle]

    N --> T["Tang server\ntang.lan"]

    T -->|"boot.initrd.clevis\n.devices.nixos-pv"| C["LUKS unlock nixos-pv\n(Clevis/Tang — unattended)"]
    T -->|"boot.initrd.clevis\n.devices.ZFS-primary/nix"| Z["ZFS-primary/nix key load\n(Clevis/Tang — unattended)"]

    C --> D[LVM activate]
    D --> E["sysroot.mount\n/ on ext4"]

    B --> G["zfs-import-zfs-primary\n(custom polling loop — retained)"]
    G --> Z

    Z --> I["sysroot-nix.mount\nZFS-primary/nix"]

    E --> J([initrd-fs.target])
    I --> J
    J --> L([stage 2 — fully unattended])

Files changed

File Change
systems/palatine-hill/hardware-changes.nix Removed requestEncryptionCredentials = mkForce false, removed postBootCommands, added boot.initrd.clevis block for both devices, added boot.initrd.systemd.network with static IP + DNS, removed /crypto from /nix depends
systems/palatine-hill/zfs.nix Removed zfs-load-nix-key unit, added boot.zfs.requestEncryptionCredentials = false

Comparison

Before After
Custom initrd units 2 (import + key load) 1 (import only; key load is native Clevis)
Key source /crypto LVM volume (disk) Tang server (network)
Disk-based key exposure Key on LVM volume inside LUKS .jwe blob only; useless without Tang
Cross-mount dependency Yes No
LUKS interactive unlock Yes No (Clevis/Tang)
Unattended boot No Yes (when Tang reachable)

Consequences

  • Boot requires Tang server to be reachable on tang.lan. If Tang is down, boot stalls at the Clevis timeout. Maintain Tang server uptime accordingly.
  • The .jwe files are safe to commit to the repository — they are encrypted blobs that are useless without the Tang server's private key.
  • Rolling back to a generation without Clevis (pre-ADR) requires manual LUKS passphrase entry at the console; ensure prior generations remain in the bootloader during initial cutover.

Implementation Notes

Prerequisites

  1. Deploy a Tang server on the LAN and create a DNS host override in OPNsense:

    • Services → Unbound DNS → Host Overrides → tang / lan / <tang IP>
  2. Verify DNS from palatine-hill before rebooting:

    resolvectl query tang.lan
    

Create the JWE files

Run from the repository root on a machine that has the LUKS passphrase and access to the running /crypto volume:

# LUKS passphrase JWE — substitute your actual passphrase
echo -n "your-luks-passphrase" | \
  clevis encrypt tang '{"url":"http://tang.lan"}' \
  > systems/palatine-hill/nixos-pv.jwe

# ZFS dataset key JWE — key file from the running system
clevis encrypt tang '{"url":"http://tang.lan"}' \
  < /crypto/keys/zfs-nix-store-key \
  > systems/palatine-hill/nix-store.jwe

Commit and build

git add systems/palatine-hill/nixos-pv.jwe systems/palatine-hill/nix-store.jwe
git commit -m "feat(palatine-hill): add Clevis JWE files for Tang-based boot unlock"

nix build .#palatine-hill   # verify build succeeds

Deploy

nh os switch   # keep previous generation in bootloader for rollback

Verify after reboot

# Confirm ZFS dataset was unlocked automatically
zfs get keystatus ZFS-primary/nix
# Expected: keystatus = available

# Check Clevis log output
journalctl -b | grep -i clevis

# Confirm Tang was reached during initrd
journalctl -b | grep -i tang

Rollback procedure (if needed)

Select the previous generation from the systemd-boot menu at boot. You will be prompted interactively for the LUKS passphrase — this is expected for the old generation.