mailbox: give sidecar netns real IPv6 egress; resolve AAAA trap; DNS notes

Add enable_ipv6 + a ULA subnet to tailwart_default so the Stalwart
container (sharing the ts-stalwart netns) gets working IPv6 egress.
Because only egress is needed (inbound arrives via the edge/tailnet),
a ULA + Docker masquerade suffices -- no routable prefix, ndppd, or
host sysctl changes (Docker 29 enables ip6tables by default; host
forwarding was already on). Verified: ping6 + TCP/443 to v6 literals
from inside the netns; zero ENETUNREACH since boot.

LESSONS: mark #8/#9 resolved with the ULA-masquerade recipe, and add
#13 -- Spaceship's DNS API is RRSet-upsert (not zone-replace), so
Stalwart/ACME did not eat custom AAAA records; a vanished AAAA is a
provider-side loss, not Stalwart. Includes the safe read/verify flow
and the "don't publish mail AAAA before edge v6 listeners" caveat.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Wayne Hayes 2026-06-11 23:53:28 +01:00
parent 3a9819c3ee
commit 34422ba2b1
2 changed files with 80 additions and 0 deletions

View File

@ -113,6 +113,9 @@ single missing address family wedges all mail to dual-stack destinations.
over the **tailnet** sidesteps this entirely — you connect to a tailnet over the **tailnet** sidesteps this entirely — you connect to a tailnet
`100.x` address, which has no AAAA, so the v6-first trap never triggers. `100.x` address, which has no AAAA, so the v6-first trap never triggers.
> **RESOLVED (2026-06-11) — option (b) is now done.** The container has real
> IPv6 egress; this trap no longer fires. See Lesson 9's fix for how.
## 9. Configuring IPv6 on the KVM host does NOT give the container IPv6 ## 9. Configuring IPv6 on the KVM host does NOT give the container IPv6
**Symptom:** `ip -6 addr` and `ping6 google.com` succeed on the KVM host, but **Symptom:** `ip -6 addr` and `ping6 google.com` succeed on the KVM host, but
@ -130,6 +133,30 @@ container's netns. For mail egress, the IPv4-literal relay (Lesson 8) or the
tailnet relay avoids needing container IPv6 at all. Enabling true container tailnet relay avoids needing container IPv6 at all. Enabling true container
IPv6 (Docker IPv6 + routing the /64 in) is a separate, larger task. IPv6 (Docker IPv6 + routing the /64 in) is a separate, larger task.
**RESOLVED (2026-06-11) — the easy way, no /64 routing or ndppd.** Because the
container only needs IPv6 **egress** (inbound arrives via the edge/tailnet,
never v6), you don't need a routable prefix or NDP proxy at all — just a **ULA
subnet + masquerade**, exactly like Docker does for v4:
```yaml
# docker-compose.yml
networks:
default:
enable_ipv6: true
ipam:
config:
- subnet: fd00:7a17:600d::/64
gateway: fd00:7a17:600d::1
```
Docker 29 enables `ip6tables` by default and masquerades the ULA out the host's
global v6, so the sidecar netns (shared by Stalwart via `network_mode`) gets a
working v6 default route with **zero host sysctl/daemon changes** (host
`net.ipv6.conf.all.forwarding` was already 1 from the static-v6 setup). Verify
from *inside* the netns: `ping6 google.com` + a TCP connect to a v6 literal on
:443. Recreating the network (`docker compose down && up`) bounces the stack and
the ephemeral sidecar gets a new tailnet IP — MagicDNS covers it (Lesson 6), and
the MTA route table rebuilds anyway (Lesson 12). This does **not** give inbound
v6; for that you'd still publish AAAA + make the edge listen on v6 (separate).
## 10. The VPS blocks ALL outbound SMTP ports — relay over the tailnet ## 10. The VPS blocks ALL outbound SMTP ports — relay over the tailnet
**Symptom:** Direct MX delivery and relay-to-public-host both fail with **Symptom:** Direct MX delivery and relay-to-public-host both fail with
@ -182,3 +209,40 @@ route map. So route/strategy changes are invisible until restart.
`docker restart tailwart-stalwart-1`. (Side effect: the ephemeral sidecar gets a `docker restart tailwart-stalwart-1`. (Side effect: the ephemeral sidecar gets a
new tailnet IP each restart — anything addressing it by IP must rediscover it; new tailnet IP each restart — anything addressing it by IP must rediscover it;
use the MagicDNS name where possible.) use the MagicDNS name where possible.)
## 13. "Did Stalwart eat my custom DNS records?" — no; Spaceship is RRSet-upsert
**Symptom:** A manually-added record (e.g. an `AAAA` for the apex/`mail`) is
gone from the zone, and the suspicion is that Stalwart's ACME DNS-01 integration
overwrote it on a renewal.
**Cause:** Almost never Stalwart. Its **only** DNS-provider writes are
`_acme-challenge.<name>` TXT (the rotating challenge) and `_validation-persist`
TXT (the LE account-pinned persistent-validation record). It does **not** create
or modify A/AAAA/MX/SRV — those you add yourself from its "recommended records"
page. And the Spaceship API is **RRSet-upsert keyed by (name, type)**, not a
whole-zone replace: a `PUT /api/v1/dns/records/{domain}` with
`{"force":true,"items":[…]}` only touches the RRSets named in `items`. Proof:
25 unrelated records coexist untouched through every rotating `_acme-challenge`
write; and adding one apex `AAAA` left the other 25 exactly intact (25→26).
So a vanished AAAA is far more likely a **provider-side loss/rollback** (e.g.
during a data-center DDoS) or a manual edit — not Stalwart.
**How to inspect / verify (read-only), creds in `.env`:**
```bash
KEY=$(grep '^SPACESHIP_KEY=' .env | cut -d= -f2)
SECRET=$(grep '^SPACESHIP_SECRET=' .env | cut -d= -f2)
curl -s "https://spaceship.dev/api/v1/dns/records/<domain>?take=100&skip=0" \
-H "X-Api-Key: $KEY" -H "X-Api-Secret: $SECRET" | python3 -m json.tool
```
To add a record, `PUT` the same endpoint with a single-item `items` array — it
won't disturb siblings. **Snapshot the zone (GET) before any write** and diff
after; snapshots land in `_backup/` (gitignored). Always re-check at the
authoritative NS (`dig +short AAAA <name> @launch1.spaceship.net`), not a cache.
**Caveat — don't publish `mail` AAAA before the edge listens on v6.** Inbound
mail follows `MX → mail.<domain>`; an `AAAA` there with no v6 `:25` listener on
the edge makes senders try v6 and some won't fall back → deferred/bounced mail.
An **apex** `AAAA` is safe (it doesn't affect MX routing). Do `mail` AAAA + edge
v6 listeners together.

View File

@ -80,3 +80,19 @@ services:
volumes: volumes:
stalwart-data: stalwart-data:
# The sidecar's bridge (shared by stalwart via network_mode) gets IPv6 here so
# the container can reach AAAA-only / dual-stack hosts. Without it the netns has
# no global v6 → Stalwart tries AAAA first, gets ENETUNREACH, and for a relay
# next-hop never falls back to A (see LESSONS.md #8). A ULA subnet is fine: we
# only need *egress* (inbound arrives via the edge/tailnet, never v6). Docker 29
# masquerades it out the host's global v6 via ip6tables — no routable prefix,
# NDP proxy, or host sysctl needed. Recreating this network bounces the stack
# and the ephemeral sidecar gets a new tailnet IP (MagicDNS handles it).
networks:
default:
enable_ipv6: true
ipam:
config:
- subnet: fd00:7a17:600d::/64
gateway: fd00:7a17:600d::1