add tang and clevis to palatine-hill

This commit is contained in:
2026-05-03 13:29:39 -04:00
parent f5d0f97400
commit 24d451f825
4 changed files with 263 additions and 36 deletions
@@ -0,0 +1,32 @@
---
description: "Use when working with SOPS secrets files (secrets.yaml). Never modify secrets.yaml files directly — always prompt the user to make changes using sops edit."
applyTo: "**"
---
# SOPS Secrets Files — Read-Only
Never modify any `secrets.yaml` file in this repository. These files are SOPS-encrypted and editing them directly (without `sops edit`) will corrupt the encryption and make the secrets unrecoverable.
## Rules
- **Do NOT edit `secrets.yaml` files** using file editing tools, even for renaming keys, restructuring blocks, or adding new entries.
- **Do NOT suggest patches or diffs** that target `secrets.yaml` files.
- **Always prompt the user** to make the change themselves using:
```bash
sops edit <path-to-secrets.yaml>
```
- When a new secret key is needed (e.g., for a new SOPS reference in Nix code), tell the user the exact key name and value to add, and ask them to add it via `sops edit`.
- You may **read** `secrets.yaml` files (e.g., with grep to check key names) — reading is safe. Only writing is forbidden.
## Example
Instead of editing `systems/palatine-hill/secrets.yaml` directly, say:
> Please run `sops edit systems/palatine-hill/secrets.yaml` and add the following under the `kanidm:` block:
>
> ```yaml
> kanidm:
> gitea_oidc_client_secret: "<your-generated-secret>"
> ```
+206
View File
@@ -0,0 +1,206 @@
> Note: This document was AI-generated and reviewed by a maintainer.
# ADR 0001 — ZFS Native Encryption: Non-Interactive initrd Key Loading
| | |
|---|---|
| **Status** | Accepted |
| **Date** | 2026-05-03 |
| **Deciders** | Alice Huston |
| **Affects** | `systems/palatine-hill/hardware-changes.nix`, `systems/palatine-hill/zfs.nix` |
---
## Context
`palatine-hill` uses ZFS native encryption for the `/nix` dataset (`ZFS-primary/nix`). The ZFS encryption key was stored on a separate LVM volume (`/crypto/keys/zfs-nix-store-key`) inside the same LUKS container as root.
This created a forced ordering dependency: the `/nix` dataset could not be unlocked until root (`/`) and `/crypto` were both mounted, even though logically they are independent. Two custom initrd units worked around this:
- `zfs-import-zfs-primary` — polling import loop (duplicates NixOS-native logic)
- `zfs-load-nix-key` — reads key from `/sysroot/crypto/keys/zfs-nix-store-key` after `sysroot.mount`
Additionally, `boot.zfs.requestEncryptionCredentials` was forced off entirely, and a `postBootCommands` fallback ran
`zfs load-key -a` after stage 2 as a belt-and-suspenders measure. LUKS unlock was also interactive, requiring manual
passphrase entry at boot.
### Current initrd dependency graph (before this ADR)
```mermaid
flowchart TD
A([initrd start]) --> B[systemd-udev-settle]
A --> C["LUKS unlock nixos-pv\n⚠ interactive"]
C --> D[LVM activate]
D --> E["sysroot.mount\n/ on ext4"]
D --> F["sysroot-crypto.mount\n/crypto on LVM volume"]
B --> G["zfs-import-zfs-primary\n(custom polling loop, 60s timeout)"]
E --> H["zfs-load-nix-key\n(reads /sysroot/crypto/keys/zfs-nix-store-key)"]
F --> H
G --> H
H --> I["sysroot-nix.mount\nZFS-primary/nix"]
I --> J([initrd-fs.target])
E --> J
J --> K([stage 2])
K --> L["postBootCommands:\nzfs load-key -a"]
```
### Problems with the old approach
1. **Cross-filesystem key dependency**: `/nix` unlock depends on root mount, coupling two logically independent operations.
2. **Duplicated pool import logic**: the custom unit reimplements a polling loop that NixOS already generates natively; upstream fixes don't apply automatically.
3. **Native credential handling fully disabled**: `requestEncryptionCredentials = false` makes the configuration opaque to NixOS module evaluation.
4. **Double key load**: `postBootCommands` is a workaround indicating the initrd path is not reliable.
5. **Interactive LUKS unlock**: manual passphrase entry required at every boot — defeats unattended operation.
---
## Options Considered
### Option A — Key embedded in initrd (`boot.initrd.secrets`)
Store the ZFS key directly inside the initrd cpio archive. The key is available from the very start of stage 1 without mounting anything.
**Pro**: Eliminates the cross-mount dependency; re-enables native NixOS ZFS handling; zero new infrastructure.
**Con**: Key lives in the initrd on `/boot`, which is an unencrypted vfat partition. Anyone with physical or boot-partition read access has the key. Does not solve interactive LUKS unlock.
### Option B — Tang network key fetch (Clevis) ✅ Chosen
Encrypt both secrets (LUKS passphrase and ZFS key) as Clevis JWE blobs. At boot, the initrd reaches a Tang server
on the LAN to decrypt them. NixOS's `boot.initrd.clevis` module natively supports `luks`, `zfs`, and `bcachefs`
**no custom unit is needed for ZFS**.
**Pro**: Key never present on disk in plaintext; unified unlock surface for both LUKS and ZFS; no cross-mount dependency; JWE blobs on disk are useless without the Tang server.
**Con**: Adds Tang server as a boot dependency; server won't boot if Tang is unreachable.
---
## Decision
**Option B (Tang/Clevis) is adopted** for both the LUKS root device and the ZFS `/nix` dataset.
`boot.initrd.clevis.devices` handles both unlock targets natively. The custom `zfs-load-nix-key` unit is deleted
entirely. The `zfs-import-zfs-primary` unit is retained — the pool must still be imported before Clevis can load the
dataset key.
Static networking is configured in the initrd using systemd-networkd with a static IP (`192.168.76.2/24`). DNS
resolution (`192.168.76.1`, the OPNsense router running Unbound) allows the Tang URL to be `http://tang.lan`.
### New initrd dependency graph
```mermaid
flowchart TD
A([initrd start]) --> N["initrd-networkd\neno1: 192.168.76.2/24\nDNS: 192.168.76.1"]
A --> B[systemd-udev-settle]
N --> T["Tang server\ntang.lan"]
T -->|"boot.initrd.clevis\n.devices.nixos-pv"| C["LUKS unlock nixos-pv\n(Clevis/Tang — unattended)"]
T -->|"boot.initrd.clevis\n.devices.ZFS-primary/nix"| Z["ZFS-primary/nix key load\n(Clevis/Tang — unattended)"]
C --> D[LVM activate]
D --> E["sysroot.mount\n/ on ext4"]
B --> G["zfs-import-zfs-primary\n(custom polling loop — retained)"]
G --> Z
Z --> I["sysroot-nix.mount\nZFS-primary/nix"]
E --> J([initrd-fs.target])
I --> J
J --> L([stage 2 — fully unattended])
```
### Files changed
| File | Change |
|---|---|
| `systems/palatine-hill/hardware-changes.nix` | Removed `requestEncryptionCredentials = mkForce false`, removed `postBootCommands`, added `boot.initrd.clevis` block for both devices, added `boot.initrd.systemd.network` with static IP + DNS, removed `/crypto` from `/nix` depends |
| `systems/palatine-hill/zfs.nix` | Removed `zfs-load-nix-key` unit, added `boot.zfs.requestEncryptionCredentials = false` |
### Comparison
| | Before | After |
|---|---|---|
| Custom initrd units | 2 (import + key load) | 1 (import only; key load is native Clevis) |
| Key source | `/crypto` LVM volume (disk) | Tang server (network) |
| Disk-based key exposure | Key on LVM volume inside LUKS | `.jwe` blob only; useless without Tang |
| Cross-mount dependency | Yes | No |
| LUKS interactive unlock | Yes | No (Clevis/Tang) |
| Unattended boot | No | Yes (when Tang reachable) |
---
## Consequences
- Boot requires Tang server to be reachable on `tang.lan`. If Tang is down, boot stalls at the Clevis timeout. Maintain Tang server uptime accordingly.
- The `.jwe` files are safe to commit to the repository — they are encrypted blobs that are useless without the Tang server's private key.
- Rolling back to a generation without Clevis (pre-ADR) requires manual LUKS passphrase entry at the console; ensure prior generations remain in the bootloader during initial cutover.
---
## Implementation Notes
### Prerequisites
1. Deploy a Tang server on the LAN and create a DNS host override in OPNsense:
- Services → Unbound DNS → Host Overrides → `tang` / `lan` / `<tang IP>`
2. Verify DNS from palatine-hill before rebooting:
```bash
resolvectl query tang.lan
```
### Create the JWE files
Run from the repository root on a machine that has the LUKS passphrase and access to the running `/crypto` volume:
```bash
# LUKS passphrase JWE — substitute your actual passphrase
echo -n "your-luks-passphrase" | \
clevis encrypt tang '{"url":"http://tang.lan"}' \
> systems/palatine-hill/nixos-pv.jwe
# ZFS dataset key JWE — key file from the running system
clevis encrypt tang '{"url":"http://tang.lan"}' \
< /crypto/keys/zfs-nix-store-key \
> systems/palatine-hill/nix-store.jwe
```
### Commit and build
```bash
git add systems/palatine-hill/nixos-pv.jwe systems/palatine-hill/nix-store.jwe
git commit -m "feat(palatine-hill): add Clevis JWE files for Tang-based boot unlock"
nix build .#palatine-hill # verify build succeeds
```
### Deploy
```bash
nh os switch # keep previous generation in bootloader for rollback
```
### Verify after reboot
```bash
# Confirm ZFS dataset was unlocked automatically
zfs get keystatus ZFS-primary/nix
# Expected: keystatus = available
# Check Clevis log output
journalctl -b | grep -i clevis
# Confirm Tang was reached during initrd
journalctl -b | grep -i tang
```
### Rollback procedure (if needed)
Select the previous generation from the systemd-boot menu at boot. You will be prompted interactively for the LUKS passphrase — this is expected for the old generation.
+24 -9
View File
@@ -1,11 +1,7 @@
{ lib, pkgs, ... }:
{ lib, ... }:
{
boot = {
zfs.requestEncryptionCredentials = lib.mkForce false;
postBootCommands = ''
${pkgs.zfs}/bin/zfs load-key -a
'';
initrd = {
services.lvm.enable = true;
luks.devices = {
@@ -16,6 +12,28 @@
};
};
clevis = {
enable = true;
useTang = true;
devices = {
# Unlock LUKS root device via Tang
"nixos-pv".secretFile = ./nixos-pv.jwe;
# Unlock ZFS native-encrypted dataset via Tang
"ZFS-primary/nix".secretFile = ./nix-store.jwe;
};
};
# Static networking needed in initrd so Tang is reachable before any disk mounts
systemd.network = {
enable = true;
networks."10-initrd-eno1" = {
matchConfig.Name = "eno1";
address = [ "192.168.76.2/24" ];
routes = [ { Gateway = "192.168.76.1"; } ];
dns = [ "192.168.76.1" ];
linkConfig.RequiredForOnline = "routable";
};
};
};
};
@@ -37,10 +55,7 @@
"dmask=0077"
];
"/nix".depends = [
"/"
"/crypto"
];
"/nix".depends = [ "/" ];
};
}
+1 -27
View File
@@ -7,6 +7,7 @@
{
boot = {
zfs.extraPools = [ "ZFS-primary" ];
zfs.requestEncryptionCredentials = false;
filesystem = "zfs";
extraModprobeConfig = ''
options zfs zfs_arc_min=82463372083
@@ -85,33 +86,6 @@
fi
'';
};
zfs-load-nix-key = {
description = "Load ZFS key for ZFS-primary/nix in initrd";
wantedBy = [ "initrd-fs.target" ];
requires = [
"sysroot.mount"
"zfs-import-zfs-primary.service"
];
after = [
"sysroot.mount"
"zfs-import-zfs-primary.service"
];
before = [
"initrd-fs.target"
"sysroot-nix.mount"
];
unitConfig.DefaultDependencies = "no";
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
};
path = with pkgs; [ zfs ];
script = ''
key_file="/sysroot/crypto/keys/zfs-nix-store-key"
zfs load-key -L "file://$key_file" "ZFS-primary/nix"
'';
};
};
};