Harden mail edge: PG-race healthcheck gate, :443 SNI fan-out, docs #1

Merged
wayne merged 1 commits from mail-edge-hardening into main 2026-06-11 00:47:30 -04:00
Owner

Fixes the root cause that was silently dropping Stalwart's cert/setting
writes, completes the public HTTPS endpoints, and captures the debugging
knowledge.

  • docker-compose.yml: gate the ts-stalwart healthcheck on Postgres
    reachability (nc -z the-record-prod:5432) in addition to tailscaled
    health. Stalwart's depends_on: service_healthy can no longer release it
    into the window where the tailnet route to Postgres isn't up yet — which
    was failing table init and losing in-flight cert writes (-> rcgen).

  • caddy/caddy.json + README: add the :443 SNI fan-out. mta-sts /
    autoconfig / autodiscover pass through to stalwart:443 (Stalwart
    terminates TLS with its wildcard cert; no proxy_protocol on :443).
    All other SNIs go to the box's web Caddy on :8443 (https_port 8443).
    L7 reverse_proxy is impossible here: CAA pins issuance to Stalwart's
    ACME account, so Caddy can't obtain its own cert for these names.

  • acl-snippet.hujson: grant tcp:443 on reverse-proxy -> stalwart for the
    SNI pass-through.

  • config/config.json: track the v0.16 bootstrap (commit-safe; the DB
    secret is an EnvironmentVariable reference, not inline).

  • LESSONS.md: symptom -> cause -> fix notes (PG race, DNS-01/Spaceship
    dead key, auto-ban vs PROXY protocol, wildcard-requires-DNS-01, SNI
    pass-through, ephemeral sidecar IP, LE rate-limit checks).

  • .gitignore: exclude _backup/ and _validate/ (DB dumps + an inline-secret
    config) and editor swap files. NEVER commit those.

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

Fixes the root cause that was silently dropping Stalwart's cert/setting writes, completes the public HTTPS endpoints, and captures the debugging knowledge. - docker-compose.yml: gate the ts-stalwart healthcheck on Postgres reachability (nc -z the-record-prod:5432) in addition to tailscaled health. Stalwart's depends_on: service_healthy can no longer release it into the window where the tailnet route to Postgres isn't up yet — which was failing table init and losing in-flight cert writes (-> rcgen). - caddy/caddy.json + README: add the :443 SNI fan-out. mta-sts / autoconfig / autodiscover pass through to stalwart:443 (Stalwart terminates TLS with its wildcard cert; no proxy_protocol on :443). All other SNIs go to the box's web Caddy on :8443 (https_port 8443). L7 reverse_proxy is impossible here: CAA pins issuance to Stalwart's ACME account, so Caddy can't obtain its own cert for these names. - acl-snippet.hujson: grant tcp:443 on reverse-proxy -> stalwart for the SNI pass-through. - config/config.json: track the v0.16 bootstrap (commit-safe; the DB secret is an EnvironmentVariable reference, not inline). - LESSONS.md: symptom -> cause -> fix notes (PG race, DNS-01/Spaceship dead key, auto-ban vs PROXY protocol, wildcard-requires-DNS-01, SNI pass-through, ephemeral sidecar IP, LE rate-limit checks). - .gitignore: exclude _backup/ and _validate/ (DB dumps + an inline-secret config) and editor swap files. NEVER commit those. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
wayne added 1 commit 2026-06-11 00:46:57 -04:00
Fixes the root cause that was silently dropping Stalwart's cert/setting
writes, completes the public HTTPS endpoints, and captures the debugging
knowledge.

- docker-compose.yml: gate the ts-stalwart healthcheck on Postgres
  reachability (nc -z the-record-prod:5432) in addition to tailscaled
  health. Stalwart's depends_on: service_healthy can no longer release it
  into the window where the tailnet route to Postgres isn't up yet — which
  was failing table init and losing in-flight cert writes (-> rcgen).

- caddy/caddy.json + README: add the :443 SNI fan-out. mta-sts /
  autoconfig / autodiscover pass through to stalwart:443 (Stalwart
  terminates TLS with its wildcard cert; no proxy_protocol on :443).
  All other SNIs go to the box's web Caddy on :8443 (https_port 8443).
  L7 reverse_proxy is impossible here: CAA pins issuance to Stalwart's
  ACME account, so Caddy can't obtain its own cert for these names.

- acl-snippet.hujson: grant tcp:443 on reverse-proxy -> stalwart for the
  SNI pass-through.

- config/config.json: track the v0.16 bootstrap (commit-safe; the DB
  secret is an EnvironmentVariable reference, not inline).

- LESSONS.md: symptom -> cause -> fix notes (PG race, DNS-01/Spaceship
  dead key, auto-ban vs PROXY protocol, wildcard-requires-DNS-01, SNI
  pass-through, ephemeral sidecar IP, LE rate-limit checks).

- .gitignore: exclude _backup/ and _validate/ (DB dumps + an inline-secret
  config) and editor swap files. NEVER commit those.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
wayne merged commit 45e06ed524 into main 2026-06-11 00:47:30 -04:00
Sign in to join this conversation.
No reviewers
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: wayne/tailwart#1
No description provided.