Fixes the root cause that was silently dropping Stalwart's cert/setting writes, completes the public HTTPS endpoints, and captures the debugging knowledge. - docker-compose.yml: gate the ts-stalwart healthcheck on Postgres reachability (nc -z the-record-prod:5432) in addition to tailscaled health. Stalwart's depends_on: service_healthy can no longer release it into the window where the tailnet route to Postgres isn't up yet — which was failing table init and losing in-flight cert writes (-> rcgen). - caddy/caddy.json + README: add the :443 SNI fan-out. mta-sts / autoconfig / autodiscover pass through to stalwart:443 (Stalwart terminates TLS with its wildcard cert; no proxy_protocol on :443). All other SNIs go to the box's web Caddy on :8443 (https_port 8443). L7 reverse_proxy is impossible here: CAA pins issuance to Stalwart's ACME account, so Caddy can't obtain its own cert for these names. - acl-snippet.hujson: grant tcp:443 on reverse-proxy -> stalwart for the SNI pass-through. - config/config.json: track the v0.16 bootstrap (commit-safe; the DB secret is an EnvironmentVariable reference, not inline). - LESSONS.md: symptom -> cause -> fix notes (PG race, DNS-01/Spaceship dead key, auto-ban vs PROXY protocol, wildcard-requires-DNS-01, SNI pass-through, ephemeral sidecar IP, LE rate-limit checks). - .gitignore: exclude _backup/ and _validate/ (DB dumps + an inline-secret config) and editor swap files. NEVER commit those. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
80 lines
3.3 KiB
Markdown
80 lines
3.3 KiB
Markdown
# tailwart edge — layer-4 mail proxy
|
|
|
|
A custom Caddy (with the `caddy-l4` app) that pipes the public mail ports to the
|
|
Stalwart sidecar over the tailnet. Pure TCP pass-through with PROXY protocol —
|
|
Stalwart still terminates all the TLS. **Runs anywhere** with a public IP that's
|
|
on the tailnet and tagged `tag:reverse-proxy`; doesn't need to share a host with
|
|
the mailbox.
|
|
|
|
## Why layer 4 and not a normal Caddy vhost
|
|
|
|
Web apps reverse-proxy at layer 7 (route by Host/SNI, Caddy terminates TLS).
|
|
Mail can't: port 25 has no SNI (STARTTLS comes after connect), and you want one
|
|
global `:25` listener, not per-domain routing. So the edge is a dumb L4 pipe and
|
|
Stalwart owns the TLS. The novelty you spotted: this is the same `stream`-style
|
|
proxying nginx/Caddy can do for *any* TCP — it just usually isn't used for it.
|
|
|
|
## Build & run
|
|
|
|
```bash
|
|
docker compose up -d --build # builds the image, runs it
|
|
```
|
|
|
|
The Dockerfile doesn't compile Caddy — it pulls the prebuilt L4-enabled binary
|
|
from `caddyserver.com/api/download` (the house method, see `~/docs/caddy.md`
|
|
"Custom Binary"), dodging the ~1GB-RAM local `xcaddy` build this VPS can't
|
|
afford. The build still fails loudly if `caddy-l4` isn't in the downloaded
|
|
binary. To add plugins, append `&p=<url-encoded module path>` to
|
|
`CADDY_DOWNLOAD` in the Dockerfile.
|
|
|
|
## Edit the upstream
|
|
|
|
`caddy.json` dials `stalwart.tail7b1641.ts.net:<port>`. If your
|
|
`STALWART_MAGIC_NAME` / `TS_TAILNET` differ, update the five `dial` lines. (JSON
|
|
can't read `.env`; this is the one spot the MagicDNS name is hardcoded — same
|
|
trade-off as pgAdmin's `servers.json`.)
|
|
|
|
## The HTTP side (MTA-STS / autoconfig / autodiscover) — `:443` SNI fan-out
|
|
|
|
Stalwart publishes DNS that points public HTTPS names at this edge:
|
|
`mta-sts.`, `autoconfig.`, `autodiscover.<domain>`. They serve the MTA-STS
|
|
policy and mail-client autoconfig over **:443** — so the edge has to handle
|
|
`:443` too, which is where a naive setup collides with a box that already runs a
|
|
web Caddy.
|
|
|
|
The fix is **not** an L7 `reverse_proxy` (terminate at Caddy). You can't: the
|
|
domain's **CAA** record pins issuance to Stalwart's ACME account
|
|
(`accounturi=…`), so Caddy can't obtain its own cert for `*.<domain>`. Stalwart
|
|
already holds the wildcard. So we **pass TLS through** to it.
|
|
|
|
The `web` server in `caddy.json` owns `:443` and fans out by SNI:
|
|
|
|
- `mta-sts` / `autoconfig` / `autodiscover.<domain>` → `stalwart:443`
|
|
(pass-through; Stalwart terminates with its wildcard cert — **no** proxy
|
|
protocol on `:443`, unlike the mail ports).
|
|
- every other SNI → `127.0.0.1:8443`, the box's own web Caddy.
|
|
|
|
For that fallback to exist, move the web Caddy's HTTPS off `:443`:
|
|
|
|
```caddyfile
|
|
{
|
|
https_port 8443 # web vhosts now listen here; the L4 :443 forwards to them
|
|
}
|
|
|
|
your-web-site.example { reverse_proxy … }
|
|
```
|
|
|
|
HTTP→HTTPS redirects still resolve to `:443` correctly. A **mail-only** edge (no
|
|
web vhosts on the box) omits the `web` server entirely — keep just the mail
|
|
ports above.
|
|
|
|
> Note: `tag:reverse-proxy → tag:stalwart` must also grant **`tcp:443`** in the
|
|
> Tailscale ACL (see `../acl-snippet.hujson`), on top of the mail ports.
|
|
|
|
## Prerequisites on the host running this
|
|
|
|
- Joined to the tailnet, tagged `tag:reverse-proxy` (so the ACL lets it reach
|
|
`tag:stalwart`).
|
|
- Public firewall opens for whichever mail ports you expose (`25` minimum).
|
|
- Nothing else bound to those ports.
|