Harden mail edge: PG-race healthcheck gate, :443 SNI fan-out, docs
Fixes the root cause that was silently dropping Stalwart's cert/setting
writes, completes the public HTTPS endpoints, and captures the debugging
knowledge.
- docker-compose.yml: gate the ts-stalwart healthcheck on Postgres
reachability (nc -z the-record-prod:5432) in addition to tailscaled
health. Stalwart's depends_on: service_healthy can no longer release it
into the window where the tailnet route to Postgres isn't up yet — which
was failing table init and losing in-flight cert writes (-> rcgen).
- caddy/caddy.json + README: add the :443 SNI fan-out. mta-sts /
autoconfig / autodiscover pass through to stalwart:443 (Stalwart
terminates TLS with its wildcard cert; no proxy_protocol on :443).
All other SNIs go to the box's web Caddy on :8443 (https_port 8443).
L7 reverse_proxy is impossible here: CAA pins issuance to Stalwart's
ACME account, so Caddy can't obtain its own cert for these names.
- acl-snippet.hujson: grant tcp:443 on reverse-proxy -> stalwart for the
SNI pass-through.
- config/config.json: track the v0.16 bootstrap (commit-safe; the DB
secret is an EnvironmentVariable reference, not inline).
- LESSONS.md: symptom -> cause -> fix notes (PG race, DNS-01/Spaceship
dead key, auto-ban vs PROXY protocol, wildcard-requires-DNS-01, SNI
pass-through, ephemeral sidecar IP, LE rate-limit checks).
- .gitignore: exclude _backup/ and _validate/ (DB dumps + an inline-secret
config) and editor swap files. NEVER commit those.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 00:03:52 -04:00
|
|
|
# tailwart — lessons learned
|
|
|
|
|
|
|
|
|
|
Hard-won notes from bringing the mail edge up. Each entry is **symptom → cause →
|
|
|
|
|
fix**, ordered roughly by how long it cost. Read this before re-debugging.
|
|
|
|
|
|
|
|
|
|
## 1. Postgres startup race ate cert/setting writes
|
|
|
|
|
|
|
|
|
|
**Symptom:** TLS certs (manual import *and* ACME) would validate but never
|
|
|
|
|
persist — Stalwart kept serving its `rcgen` self-signed fallback. Logs showed
|
|
|
|
|
`Failed to create tables: error connecting to server` on most boots.
|
|
|
|
|
|
|
|
|
|
**Cause:** Stalwart shares the `ts-stalwart` sidecar's netns. Its `depends_on`
|
|
|
|
|
only waited for the sidecar's *own* health (`/healthz` = "tailscaled up"), which
|
|
|
|
|
flips green **before** the tailnet route to Postgres (`the-record-prod:5432`) is
|
|
|
|
|
usable. Stalwart started into that gap, failed the DB connect, and any write in
|
|
|
|
|
that window — including a freshly obtained cert — was silently lost.
|
|
|
|
|
|
|
|
|
|
**Fix:** the sidecar healthcheck now also requires Postgres to be reachable
|
|
|
|
|
(`nc -z … 5432`), so `depends_on: service_healthy` can't release Stalwart into
|
|
|
|
|
the race. See `docker-compose.yml`. First clean boot after this: zero PG errors,
|
|
|
|
|
4 live connections immediately.
|
|
|
|
|
|
|
|
|
|
## 2. DNS-01 was blocked by a dead Spaceship API key
|
|
|
|
|
|
|
|
|
|
**Symptom:** `Failed to set DNS RRSet: Unauthorized` on every record; no cert
|
|
|
|
|
issued; no `_acme-challenge` TXT ever set.
|
|
|
|
|
|
|
|
|
|
**Cause:** the cert design is ACME **DNS-01** via the **Spaceship** provider
|
|
|
|
|
(bundled in caddy/lego). The stored API key was invalid (recovery debris from an
|
|
|
|
|
earlier config attempt). Note `STALWART_ACME_PROVIDER` / `STALWART_ACME_TOKEN`
|
|
|
|
|
in `.env` are **empty and not even passed through by compose** — the provider +
|
|
|
|
|
secret are entered in the **admin UI** (stored in the DB), not via env.
|
|
|
|
|
|
|
|
|
|
**Gotcha:** secret fields render **blank** in the Stalwart admin even when set
|
|
|
|
|
(the S3 secret behaves identically). A blank field is *not* evidence it's unset.
|
|
|
|
|
|
|
|
|
|
**Fix / how to verify a key directly (egresses the box's WAN IP, same as
|
|
|
|
|
Stalwart):**
|
|
|
|
|
```bash
|
|
|
|
|
curl -i 'https://spaceship.dev/api/v1/dns/records/<domain>?take=5&skip=0' \
|
|
|
|
|
-H 'X-Api-Key: KEY' -H 'X-Api-Secret: SECRET'
|
|
|
|
|
# 401 application.unauthorized = bad key/secret or IP-restricted
|
|
|
|
|
# 200 = good
|
|
|
|
|
```
|
|
|
|
|
A fresh Spaceship key fixed it.
|
|
|
|
|
|
|
|
|
|
## 3. Stalwart's auto-ban vs PROXY protocol (the "8080 mystery")
|
|
|
|
|
|
|
|
|
|
**Symptom:** the edge box could relay mail fine but could **not** reach
|
|
|
|
|
Stalwart's `:8080` admin — connections accept then immediately close. Looked like
|
|
|
|
|
"tagged devices rejected, user phone works."
|
|
|
|
|
|
|
|
|
|
**Cause:** Stalwart's fail2ban checks the **proxied client IP** (from the PROXY
|
|
|
|
|
header) on the mail listeners, but the **raw connection IP** on the non-proxied
|
|
|
|
|
admin listener. A banned edge-box IP therefore still relays mail (ban checked
|
|
|
|
|
against the header IP) while direct `→:8080` is dropped (checked against the box
|
|
|
|
|
IP). Malformed probing of the mail ports **re-arms** the ban.
|
|
|
|
|
|
|
|
|
|
**Fix:** add `100.64.0.0/10` (and the box's WAN IP, which appears as the proxied
|
|
|
|
|
client when you hit the box's own public hostname) to the fail2ban allow-list.
|
|
|
|
|
Bans are in-memory — a Stalwart restart flushes them. **Don't rapid-poll the mail
|
|
|
|
|
ports** to test.
|
|
|
|
|
|
|
|
|
|
## 4. The wildcard request *required* DNS-01 (why HTTP-01 was a dead end)
|
|
|
|
|
|
|
|
|
|
With "Additional Hostnames" left empty, Stalwart requests a **wildcard**
|
|
|
|
|
(`*.<domain>`). Wildcards can **only** be issued via DNS-01 — HTTP-01 literally
|
|
|
|
|
cannot satisfy them. We burned time on an HTTP-01 + Caddy-challenge-forwarding
|
|
|
|
|
detour before realizing DNS-01 was the intended (and only viable) path. One
|
|
|
|
|
wildcard cert then covers `mail`, `mta-sts`, `autoconfig`, `autodiscover`, etc.
|
|
|
|
|
|
|
|
|
|
## 5. `:443` web endpoints need SNI pass-through, not L7 proxy
|
|
|
|
|
|
|
|
|
|
MTA-STS / autoconfig / autodiscover serve over **:443**. You cannot L7
|
|
|
|
|
`reverse_proxy` them through Caddy, because the **CAA** record pins issuance to
|
|
|
|
|
Stalwart's ACME account — Caddy can't get its own cert for those names. Stalwart
|
|
|
|
|
holds the wildcard, so the edge **passes TLS through** by SNI. See
|
|
|
|
|
`caddy/README.md` → "The HTTP side". Needed `tcp:443` added to the
|
|
|
|
|
`reverse-proxy → stalwart` ACL grant.
|
|
|
|
|
|
|
|
|
|
## 6. The sidecar is ephemeral — never hardcode its tailnet IP
|
|
|
|
|
|
|
|
|
|
`ts-stalwart` runs with `?ephemeral=true`, so its tailnet IP **changes on
|
|
|
|
|
re-registration** (an ACL re-sync did this mid-debug: `100.112.26.122 →
|
|
|
|
|
100.79.87.80`). Everything must use the MagicDNS name
|
|
|
|
|
`stalwart.tail7b1641.ts.net`. A hardcoded IP will mysteriously go
|
|
|
|
|
`Network is unreachable`.
|
|
|
|
|
|
|
|
|
|
## 7. Don't trust crt.sh for rate-limit checks
|
|
|
|
|
|
|
|
|
|
crt.sh was flaky/empty all session. To gauge Let's Encrypt's weekly
|
|
|
|
|
duplicate-cert limit, use **certspotter** instead:
|
|
|
|
|
`https://api.certspotter.com/v1/issuances?domain=<d>&include_subdomains=true`.
|
|
|
|
|
Also: LE limits are dimensioned — **failed validations** are hourly (5/hr/host,
|
|
|
|
|
the one a retry storm trips), **issued duplicates** are weekly (5/wk). A renewal
|
|
|
|
|
task hammering every 10 min trips the hourly one; consolidate to a single task.
|
2026-06-11 17:43:21 -04:00
|
|
|
|
|
|
|
|
## 8. The Stalwart container has no IPv6 — AAAA targets fail before IPv4 is tried
|
|
|
|
|
|
|
|
|
|
**Symptom:** Outbound delivery (and relay-to-smarthost) to any host with an
|
|
|
|
|
AAAA record fails with `I/O error: Network is unreachable (os error 101)`.
|
|
|
|
|
Hosts that are IPv4-only deliver fine. Pointing a relay at a *hostname* that
|
|
|
|
|
has both A and AAAA fails; pointing it at the raw IPv4 works.
|
|
|
|
|
|
|
|
|
|
**Cause:** Stalwart shares the `ts-stalwart` sidecar's netns, which has no
|
|
|
|
|
global IPv6. When it resolves a dual-stack target it tries the AAAA first,
|
|
|
|
|
gets `ENETUNREACH` immediately, and for a **relay next-hop it does not fall
|
|
|
|
|
back to the A record** — it just records the v6 failure and backs off. So a
|
|
|
|
|
single missing address family wedges all mail to dual-stack destinations.
|
|
|
|
|
|
|
|
|
|
**Fix:** Either (a) pin the relay/smarthost `address` to an **IPv4 literal**
|
|
|
|
|
(no AAAA to trip on), or (b) give the container real IPv6. Note that relaying
|
|
|
|
|
over the **tailnet** sidesteps this entirely — you connect to a tailnet
|
|
|
|
|
`100.x` address, which has no AAAA, so the v6-first trap never triggers.
|
|
|
|
|
|
2026-06-11 18:53:28 -04:00
|
|
|
> **RESOLVED (2026-06-11) — option (b) is now done.** The container has real
|
|
|
|
|
> IPv6 egress; this trap no longer fires. See Lesson 9's fix for how.
|
|
|
|
|
|
2026-06-11 17:43:21 -04:00
|
|
|
## 9. Configuring IPv6 on the KVM host does NOT give the container IPv6
|
|
|
|
|
|
|
|
|
|
**Symptom:** `ip -6 addr` and `ping6 google.com` succeed on the KVM host, but
|
|
|
|
|
Stalwart still dies with `os error 101` on AAAA targets, and the box is still
|
|
|
|
|
a broken IPv6 Tailscale exit node.
|
|
|
|
|
|
|
|
|
|
**Cause:** The host's `eth0` and the container/sidecar netns are separate
|
|
|
|
|
network stacks. Adding the provider's `/64` to `eth0` (ifupdown `inet6 static`
|
|
|
|
|
+ `onlink` default route, since the gateway is in a different /64) fixes the
|
|
|
|
|
*host*, not the container. Docker doesn't hand IPv6 to containers by default,
|
|
|
|
|
and the sidecar routes via Tailscale, not eth0.
|
|
|
|
|
|
|
|
|
|
**Fix:** Don't assume host IPv6 = container IPv6. Test from *inside* the
|
|
|
|
|
container's netns. For mail egress, the IPv4-literal relay (Lesson 8) or the
|
|
|
|
|
tailnet relay avoids needing container IPv6 at all. Enabling true container
|
|
|
|
|
IPv6 (Docker IPv6 + routing the /64 in) is a separate, larger task.
|
|
|
|
|
|
2026-06-11 18:53:28 -04:00
|
|
|
**RESOLVED (2026-06-11) — the easy way, no /64 routing or ndppd.** Because the
|
|
|
|
|
container only needs IPv6 **egress** (inbound arrives via the edge/tailnet,
|
|
|
|
|
never v6), you don't need a routable prefix or NDP proxy at all — just a **ULA
|
|
|
|
|
subnet + masquerade**, exactly like Docker does for v4:
|
|
|
|
|
```yaml
|
|
|
|
|
# docker-compose.yml
|
|
|
|
|
networks:
|
|
|
|
|
default:
|
|
|
|
|
enable_ipv6: true
|
|
|
|
|
ipam:
|
|
|
|
|
config:
|
|
|
|
|
- subnet: fd00:7a17:600d::/64
|
|
|
|
|
gateway: fd00:7a17:600d::1
|
|
|
|
|
```
|
|
|
|
|
Docker 29 enables `ip6tables` by default and masquerades the ULA out the host's
|
|
|
|
|
global v6, so the sidecar netns (shared by Stalwart via `network_mode`) gets a
|
|
|
|
|
working v6 default route with **zero host sysctl/daemon changes** (host
|
|
|
|
|
`net.ipv6.conf.all.forwarding` was already 1 from the static-v6 setup). Verify
|
|
|
|
|
from *inside* the netns: `ping6 google.com` + a TCP connect to a v6 literal on
|
|
|
|
|
:443. Recreating the network (`docker compose down && up`) bounces the stack and
|
|
|
|
|
the ephemeral sidecar gets a new tailnet IP — MagicDNS covers it (Lesson 6), and
|
|
|
|
|
the MTA route table rebuilds anyway (Lesson 12). This does **not** give inbound
|
|
|
|
|
v6; for that you'd still publish AAAA + make the edge listen on v6 (separate).
|
|
|
|
|
|
2026-06-11 17:43:21 -04:00
|
|
|
## 10. The VPS blocks ALL outbound SMTP ports — relay over the tailnet
|
|
|
|
|
|
|
|
|
|
**Symptom:** Direct MX delivery and relay-to-public-host both fail with
|
|
|
|
|
`Connection timed out (os error 110)`, and the SYN never arrives at the
|
|
|
|
|
destination. Not just port 25 — `465`, `587`, even alt-port `2525` all time out.
|
|
|
|
|
|
|
|
|
|
**Cause:** The KVM provider blocks all outbound SMTP submission ports to prevent
|
|
|
|
|
spam. Only non-SMTP ports (`443`, etc.) egress. Confirmed with:
|
|
|
|
|
```bash
|
|
|
|
|
for p in 25 465 587 2525 443; do
|
|
|
|
|
timeout 5 bash -c "exec 3<>/dev/tcp/<dst>/$p" && echo "$p OPEN" || echo "$p blocked"
|
|
|
|
|
done
|
|
|
|
|
# 443 OPEN, all SMTP ports timeout
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Fix:** Relay over the **tailnet**. Tailscale rides WireGuard/DERP (UDP 41641 /
|
|
|
|
|
443), so it's immune to SMTP port filtering. Point the relay at the smarthost's
|
|
|
|
|
**tailnet IP** (e.g. `100.x:587`), not its public address. Long-term: ask the
|
|
|
|
|
provider to unblock outbound 25/587 for verified use.
|
|
|
|
|
|
|
|
|
|
## 11. The sidecar can RECEIVE on the tailnet but can't INITIATE without an ACL grant
|
|
|
|
|
|
|
|
|
|
**Symptom:** The relay to `<mailbox-tailnet-ip>:587` times out (`os error 110`),
|
|
|
|
|
yet the **KVM host** (same physical machine) can reach that exact IP:port over
|
|
|
|
|
the tailnet fine. Looks like a routing or transparent-proxy bug.
|
|
|
|
|
|
|
|
|
|
**Cause:** The Stalwart container rides the `ts-stalwart` sidecar — a **separate
|
|
|
|
|
tailnet node** (`tag:stalwart`) from the KVM host. The `tailwart` ACL block only
|
|
|
|
|
listed `tag:stalwart` as a **destination** (`"dst": ["tag:stalwart"]`). Tailnet
|
|
|
|
|
is default-deny, so the sidecar could receive connections but could not
|
|
|
|
|
*initiate* the relay back to the mailbox → silent drop → timeout. The KVM host
|
|
|
|
|
worked because it's a different, permitted identity, which masked the real cause.
|
|
|
|
|
|
|
|
|
|
**Fix:** Add an ACL rule granting `tag:stalwart` as a **source**:
|
|
|
|
|
```json
|
|
|
|
|
{ "src": ["tag:stalwart"], "dst": ["tag:mail"], "ip": ["tcp:587"] }
|
|
|
|
|
```
|
|
|
|
|
(mailbox is `tag:mail`). Applies in seconds, no restart. See `acl-snippet.hujson`.
|
|
|
|
|
|
|
|
|
|
## 12. Stalwart only rebuilds its MTA route table at container startup
|
|
|
|
|
|
|
|
|
|
**Symptom:** You edit an `MtaRoute` (address, etc.) via API/UI, but delivery keeps
|
|
|
|
|
using the old value. The datastore shows the new value; live delivery ignores it.
|
|
|
|
|
|
|
|
|
|
**Cause:** The `routing_strategy` map is built once when the process boots. The
|
|
|
|
|
`ReloadSettings` action reloads the datastore but does **not** rebuild the SMTP
|
|
|
|
|
route map. So route/strategy changes are invisible until restart.
|
|
|
|
|
|
|
|
|
|
**Fix:** After any `MtaRoute` / `MtaOutboundStrategy` change,
|
|
|
|
|
`docker restart tailwart-stalwart-1`. (Side effect: the ephemeral sidecar gets a
|
|
|
|
|
new tailnet IP each restart — anything addressing it by IP must rediscover it;
|
|
|
|
|
use the MagicDNS name where possible.)
|
2026-06-11 18:53:28 -04:00
|
|
|
|
|
|
|
|
## 13. "Did Stalwart eat my custom DNS records?" — no; Spaceship is RRSet-upsert
|
|
|
|
|
|
|
|
|
|
**Symptom:** A manually-added record (e.g. an `AAAA` for the apex/`mail`) is
|
|
|
|
|
gone from the zone, and the suspicion is that Stalwart's ACME DNS-01 integration
|
|
|
|
|
overwrote it on a renewal.
|
|
|
|
|
|
|
|
|
|
**Cause:** Almost never Stalwart. Its **only** DNS-provider writes are
|
|
|
|
|
`_acme-challenge.<name>` TXT (the rotating challenge) and `_validation-persist`
|
|
|
|
|
TXT (the LE account-pinned persistent-validation record). It does **not** create
|
|
|
|
|
or modify A/AAAA/MX/SRV — those you add yourself from its "recommended records"
|
|
|
|
|
page. And the Spaceship API is **RRSet-upsert keyed by (name, type)**, not a
|
|
|
|
|
whole-zone replace: a `PUT /api/v1/dns/records/{domain}` with
|
|
|
|
|
`{"force":true,"items":[…]}` only touches the RRSets named in `items`. Proof:
|
|
|
|
|
25 unrelated records coexist untouched through every rotating `_acme-challenge`
|
|
|
|
|
write; and adding one apex `AAAA` left the other 25 exactly intact (25→26).
|
|
|
|
|
|
|
|
|
|
So a vanished AAAA is far more likely a **provider-side loss/rollback** (e.g.
|
|
|
|
|
during a data-center DDoS) or a manual edit — not Stalwart.
|
|
|
|
|
|
|
|
|
|
**How to inspect / verify (read-only), creds in `.env`:**
|
|
|
|
|
```bash
|
|
|
|
|
KEY=$(grep '^SPACESHIP_KEY=' .env | cut -d= -f2)
|
|
|
|
|
SECRET=$(grep '^SPACESHIP_SECRET=' .env | cut -d= -f2)
|
|
|
|
|
curl -s "https://spaceship.dev/api/v1/dns/records/<domain>?take=100&skip=0" \
|
|
|
|
|
-H "X-Api-Key: $KEY" -H "X-Api-Secret: $SECRET" | python3 -m json.tool
|
|
|
|
|
```
|
|
|
|
|
To add a record, `PUT` the same endpoint with a single-item `items` array — it
|
2026-06-12 18:47:01 -04:00
|
|
|
won't disturb siblings of a *different* name/type (but see #14 — for an existing
|
|
|
|
|
RRSet it **appends**, it does not replace). **Snapshot the zone (GET) before any
|
|
|
|
|
write** and diff after; snapshots land in `_backup/` (gitignored). Always
|
|
|
|
|
re-check at the authoritative NS (`dig +short AAAA <name> @launch1.spaceship.net`),
|
|
|
|
|
not a cache.
|
2026-06-11 18:53:28 -04:00
|
|
|
|
|
|
|
|
**Caveat — don't publish `mail` AAAA before the edge listens on v6.** Inbound
|
|
|
|
|
mail follows `MX → mail.<domain>`; an `AAAA` there with no v6 `:25` listener on
|
|
|
|
|
the edge makes senders try v6 and some won't fall back → deferred/bounced mail.
|
|
|
|
|
An **apex** `AAAA` is safe (it doesn't affect MX routing). Do `mail` AAAA + edge
|
|
|
|
|
v6 listeners together.
|
2026-06-12 18:47:01 -04:00
|
|
|
|
|
|
|
|
## 14. Spaceship `PUT` is an APPEND-by-value, not a replace — it can dupe an RRSet
|
|
|
|
|
|
|
|
|
|
**Symptom:** "Updating" the SPF record (`PUT` with `force:true` and the new
|
|
|
|
|
value) left the zone with **two** `v=spf1` apex TXT records. Two SPF records is
|
|
|
|
|
an RFC 7208 `permerror` → SPF **fails hard for everyone** — worse than the typo
|
|
|
|
|
you were fixing.
|
|
|
|
|
|
|
|
|
|
**Cause:** Spaceship keys records by (name, type, **value**). A `PUT` whose value
|
|
|
|
|
differs from the existing record is a *new* record, so `force:true` **adds**
|
|
|
|
|
rather than replacing. (The earlier AAAA/SPF adds looked like clean "upserts"
|
|
|
|
|
only because there was no prior record at that name+type, or the value matched.)
|
|
|
|
|
|
|
|
|
|
**Fix / correct pattern for an in-place value change:** `PUT` the new value, then
|
|
|
|
|
**`DELETE` the old one** — and the `DELETE` body is a **bare JSON array**, not
|
|
|
|
|
`{"items":[…]}` (the latter 422s with `Value is "object" but should be "array"`):
|
|
|
|
|
```bash
|
|
|
|
|
curl -s -X DELETE "https://spaceship.dev/api/v1/dns/records/<domain>" \
|
|
|
|
|
-H "X-Api-Key: $KEY" -H "X-Api-Secret: $SECRET" -H 'Content-Type: application/json' \
|
|
|
|
|
-d '[{"type":"TXT","name":"@","value":"v=spf1 mx -all"}]'
|
|
|
|
|
```
|
|
|
|
|
Always GET-diff before/after (count + REMOVED/ADDED sets) to catch a stray dupe.
|
|
|
|
|
|
|
|
|
|
## 15. ed25519 DKIM "fails" at Gmail with both ed25519+RSA — it's not your key
|
|
|
|
|
|
|
|
|
|
**Symptom:** DMARC aggregate reports show, per message, `dkim=pass` for the RSA
|
|
|
|
|
selector but `dkim=fail` for the ed25519 selector (`v1-ed25519-…`), on the *same*
|
|
|
|
|
intact message. Looks like a broken/mismatched ed25519 key.
|
|
|
|
|
|
|
|
|
|
**Cause:** **Not the key.** Verified cryptographically: the stored ed25519 seed
|
|
|
|
|
derives exactly the published `p=` (and the PKCS#8-v2 blob even embeds that same
|
|
|
|
|
pubkey). seed → pubkey → DNS all agree. It's the **known Stalwart dual-signing
|
|
|
|
|
issue** ([discussion #2727](https://github.com/stalwartlabs/stalwart/discussions/2727)):
|
|
|
|
|
when Stalwart applies *both* an ed25519 and an RSA signature, Gmail/Hotmail
|
|
|
|
|
mishandle the ed25519 one (`fail`, or `neutral (no key)`), while RSA passes. The
|
|
|
|
|
maintainer's own server runs with "ed25519 ignored, RSA passes." RSA carries
|
|
|
|
|
DMARC, so **mail is unaffected** — it's cosmetic, just noisy in reports.
|
|
|
|
|
|
|
|
|
|
How the key was proven (the seed lives in settings table `s`, PKCS#8 v2):
|
|
|
|
|
```bash
|
|
|
|
|
# 32-byte seed from the OCTET STRING in the stored PKCS#8; wrap as clean v0 DER:
|
|
|
|
|
printf '302e020100300506032b657004220420%s' "$SEED_HEX" | xxd -r -p > /tmp/ed.der
|
|
|
|
|
openssl pkey -inform DER -in /tmp/ed.der -pubout -outform DER | tail -c 32 | base64
|
|
|
|
|
# == the DNS p= value → key is correct
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Fix (proper = RSA-only):** the recommended cure is to stop emitting the ed25519
|
|
|
|
|
signature, not republish anything. Two parts:
|
|
|
|
|
1. **DNS (done 2026-06-12):** removed the `v1-ed25519-20260604._domainkey` TXT —
|
|
|
|
|
turns the report `fail` into a harmless "no key", DMARC still green via RSA.
|
|
|
|
|
2. **Stalwart (still TODO):** disable the ed25519 **signature** in the admin UI /
|
|
|
|
|
JMAP signing config so outbound stops carrying it (DB surgery on the serialized
|
|
|
|
|
signature object is risky — do it through the supported surface). The fallback
|
|
|
|
|
admin can't mint an API token non-interactively (only `authorization_code` /
|
|
|
|
|
`device_code` grants; no ROPC), so this needs the web UI or a device-code login.
|
|
|
|
|
|
|
|
|
|
**Aside discovered here:** outbound is a catch-all smarthost relay to
|
|
|
|
|
`mail.tail7b1641.ts.net` (auth `stalwart-relay@waynehayes.com`), which re-emits
|
|
|
|
|
as `mail.waynehayes.com` (`216.189.156.74` / `2602:ffc5:20::1:6b52`). That relay
|
|
|
|
|
IP is why SPF needed `include:waynehayes.com` (#14 / the SPF fix).
|