Fixes the root cause that was silently dropping Stalwart's cert/setting
writes, completes the public HTTPS endpoints, and captures the debugging
knowledge.
- docker-compose.yml: gate the ts-stalwart healthcheck on Postgres
reachability (nc -z the-record-prod:5432) in addition to tailscaled
health. Stalwart's depends_on: service_healthy can no longer release it
into the window where the tailnet route to Postgres isn't up yet — which
was failing table init and losing in-flight cert writes (-> rcgen).
- caddy/caddy.json + README: add the :443 SNI fan-out. mta-sts /
autoconfig / autodiscover pass through to stalwart:443 (Stalwart
terminates TLS with its wildcard cert; no proxy_protocol on :443).
All other SNIs go to the box's web Caddy on :8443 (https_port 8443).
L7 reverse_proxy is impossible here: CAA pins issuance to Stalwart's
ACME account, so Caddy can't obtain its own cert for these names.
- acl-snippet.hujson: grant tcp:443 on reverse-proxy -> stalwart for the
SNI pass-through.
- config/config.json: track the v0.16 bootstrap (commit-safe; the DB
secret is an EnvironmentVariable reference, not inline).
- LESSONS.md: symptom -> cause -> fix notes (PG race, DNS-01/Spaceship
dead key, auto-ban vs PROXY protocol, wildcard-requires-DNS-01, SNI
pass-through, ephemeral sidecar IP, LE rate-limit checks).
- .gitignore: exclude _backup/ and _validate/ (DB dumps + an inline-secret
config) and editor swap files. NEVER commit those.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>