LESSONS: SPF append-dup gotcha (#14) and ed25519 DKIM diagnosis (#15)

#14: Spaceship PUT keys records by name+type+VALUE, so changing an
existing RRSet's value APPENDS a second record (a double v=spf1 =
RFC 7208 permerror). Correct pattern: PUT new, DELETE old; DELETE body
is a bare JSON array, not {items:[...]}.

#15: ed25519 DKIM "fail" at Gmail alongside passing RSA is the known
Stalwart dual-signing issue, not a key problem -- proved the stored
seed derives the published p= exactly. Fix is RSA-only: removed the
ed25519 DNS key (done); disabling the ed25519 signature in Stalwart is
the remaining step. Also records the smarthost identity behind the SPF
fix. Corrected #13's "PUT won't disturb siblings" claim accordingly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Wayne Hayes 2026-06-12 23:47:01 +01:00
parent cd1cdbd110
commit ea7eedcb7b

View File

@ -237,12 +237,74 @@ curl -s "https://spaceship.dev/api/v1/dns/records/<domain>?take=100&skip=0" \
-H "X-Api-Key: $KEY" -H "X-Api-Secret: $SECRET" | python3 -m json.tool -H "X-Api-Key: $KEY" -H "X-Api-Secret: $SECRET" | python3 -m json.tool
``` ```
To add a record, `PUT` the same endpoint with a single-item `items` array — it To add a record, `PUT` the same endpoint with a single-item `items` array — it
won't disturb siblings. **Snapshot the zone (GET) before any write** and diff won't disturb siblings of a *different* name/type (but see #14 — for an existing
after; snapshots land in `_backup/` (gitignored). Always re-check at the RRSet it **appends**, it does not replace). **Snapshot the zone (GET) before any
authoritative NS (`dig +short AAAA <name> @launch1.spaceship.net`), not a cache. write** and diff after; snapshots land in `_backup/` (gitignored). Always
re-check at the authoritative NS (`dig +short AAAA <name> @launch1.spaceship.net`),
not a cache.
**Caveat — don't publish `mail` AAAA before the edge listens on v6.** Inbound **Caveat — don't publish `mail` AAAA before the edge listens on v6.** Inbound
mail follows `MX → mail.<domain>`; an `AAAA` there with no v6 `:25` listener on mail follows `MX → mail.<domain>`; an `AAAA` there with no v6 `:25` listener on
the edge makes senders try v6 and some won't fall back → deferred/bounced mail. the edge makes senders try v6 and some won't fall back → deferred/bounced mail.
An **apex** `AAAA` is safe (it doesn't affect MX routing). Do `mail` AAAA + edge An **apex** `AAAA` is safe (it doesn't affect MX routing). Do `mail` AAAA + edge
v6 listeners together. v6 listeners together.
## 14. Spaceship `PUT` is an APPEND-by-value, not a replace — it can dupe an RRSet
**Symptom:** "Updating" the SPF record (`PUT` with `force:true` and the new
value) left the zone with **two** `v=spf1` apex TXT records. Two SPF records is
an RFC 7208 `permerror` → SPF **fails hard for everyone** — worse than the typo
you were fixing.
**Cause:** Spaceship keys records by (name, type, **value**). A `PUT` whose value
differs from the existing record is a *new* record, so `force:true` **adds**
rather than replacing. (The earlier AAAA/SPF adds looked like clean "upserts"
only because there was no prior record at that name+type, or the value matched.)
**Fix / correct pattern for an in-place value change:** `PUT` the new value, then
**`DELETE` the old one** — and the `DELETE` body is a **bare JSON array**, not
`{"items":[…]}` (the latter 422s with `Value is "object" but should be "array"`):
```bash
curl -s -X DELETE "https://spaceship.dev/api/v1/dns/records/<domain>" \
-H "X-Api-Key: $KEY" -H "X-Api-Secret: $SECRET" -H 'Content-Type: application/json' \
-d '[{"type":"TXT","name":"@","value":"v=spf1 mx -all"}]'
```
Always GET-diff before/after (count + REMOVED/ADDED sets) to catch a stray dupe.
## 15. ed25519 DKIM "fails" at Gmail with both ed25519+RSA — it's not your key
**Symptom:** DMARC aggregate reports show, per message, `dkim=pass` for the RSA
selector but `dkim=fail` for the ed25519 selector (`v1-ed25519-…`), on the *same*
intact message. Looks like a broken/mismatched ed25519 key.
**Cause:** **Not the key.** Verified cryptographically: the stored ed25519 seed
derives exactly the published `p=` (and the PKCS#8-v2 blob even embeds that same
pubkey). seed → pubkey → DNS all agree. It's the **known Stalwart dual-signing
issue** ([discussion #2727](https://github.com/stalwartlabs/stalwart/discussions/2727)):
when Stalwart applies *both* an ed25519 and an RSA signature, Gmail/Hotmail
mishandle the ed25519 one (`fail`, or `neutral (no key)`), while RSA passes. The
maintainer's own server runs with "ed25519 ignored, RSA passes." RSA carries
DMARC, so **mail is unaffected** — it's cosmetic, just noisy in reports.
How the key was proven (the seed lives in settings table `s`, PKCS#8 v2):
```bash
# 32-byte seed from the OCTET STRING in the stored PKCS#8; wrap as clean v0 DER:
printf '302e020100300506032b657004220420%s' "$SEED_HEX" | xxd -r -p > /tmp/ed.der
openssl pkey -inform DER -in /tmp/ed.der -pubout -outform DER | tail -c 32 | base64
# == the DNS p= value → key is correct
```
**Fix (proper = RSA-only):** the recommended cure is to stop emitting the ed25519
signature, not republish anything. Two parts:
1. **DNS (done 2026-06-12):** removed the `v1-ed25519-20260604._domainkey` TXT —
turns the report `fail` into a harmless "no key", DMARC still green via RSA.
2. **Stalwart (still TODO):** disable the ed25519 **signature** in the admin UI /
JMAP signing config so outbound stops carrying it (DB surgery on the serialized
signature object is risky — do it through the supported surface). The fallback
admin can't mint an API token non-interactively (only `authorization_code` /
`device_code` grants; no ROPC), so this needs the web UI or a device-code login.
**Aside discovered here:** outbound is a catch-all smarthost relay to
`mail.tail7b1641.ts.net` (auth `stalwart-relay@waynehayes.com`), which re-emits
as `mail.waynehayes.com` (`216.189.156.74` / `2602:ffc5:20::1:6b52`). That relay
IP is why SPF needed `include:waynehayes.com` (#14 / the SPF fix).