2026-06-03 22:25:38 -04:00
|
|
|
# tailwart — Stalwart mailbox as a Tailscale sidecar (NO WAN presence).
|
|
|
|
|
#
|
|
|
|
|
# The container shares ts-stalwart's network namespace, so its only interfaces
|
|
|
|
|
# are lo and tailscale0. All mail ports listen on the tailnet only; the public
|
|
|
|
|
# edge is the separate caddy/ layer-4 proxy, which can run on another host.
|
|
|
|
|
#
|
|
|
|
|
# Prereq: the shared tailnet infra (Postgres the-record-prod, Redis
|
|
|
|
|
# slo-time-prod, Garage) must be up, and the stalwart role/db/bucket created
|
|
|
|
|
# (see README). Bring up: docker compose up -d
|
|
|
|
|
|
|
|
|
|
name: tailwart
|
|
|
|
|
|
|
|
|
|
services:
|
|
|
|
|
|
|
|
|
|
ts-stalwart:
|
|
|
|
|
image: tailscale/tailscale:latest
|
|
|
|
|
hostname: ${STALWART_MAGIC_NAME}
|
|
|
|
|
environment:
|
|
|
|
|
TS_AUTHKEY: ${TS_OAUTH_CLIENT_SECRET}?ephemeral=true
|
|
|
|
|
TS_EXTRA_ARGS: --advertise-tags=tag:stalwart
|
|
|
|
|
TS_HOSTNAME: ${STALWART_MAGIC_NAME}
|
|
|
|
|
TS_ACCEPT_DNS: "true"
|
|
|
|
|
TS_AUTH_ONCE: "true"
|
|
|
|
|
TS_USERSPACE: "false"
|
|
|
|
|
TS_ENABLE_HEALTH_CHECK: "true"
|
|
|
|
|
TS_LOCAL_ADDR_PORT: "127.0.0.1:9002"
|
|
|
|
|
dns: [1.1.1.1, 1.0.0.1]
|
|
|
|
|
devices:
|
|
|
|
|
- /dev/net/tun:/dev/net/tun
|
|
|
|
|
cap_add:
|
|
|
|
|
- NET_ADMIN
|
|
|
|
|
- NET_RAW
|
|
|
|
|
healthcheck:
|
Harden mail edge: PG-race healthcheck gate, :443 SNI fan-out, docs
Fixes the root cause that was silently dropping Stalwart's cert/setting
writes, completes the public HTTPS endpoints, and captures the debugging
knowledge.
- docker-compose.yml: gate the ts-stalwart healthcheck on Postgres
reachability (nc -z the-record-prod:5432) in addition to tailscaled
health. Stalwart's depends_on: service_healthy can no longer release it
into the window where the tailnet route to Postgres isn't up yet — which
was failing table init and losing in-flight cert writes (-> rcgen).
- caddy/caddy.json + README: add the :443 SNI fan-out. mta-sts /
autoconfig / autodiscover pass through to stalwart:443 (Stalwart
terminates TLS with its wildcard cert; no proxy_protocol on :443).
All other SNIs go to the box's web Caddy on :8443 (https_port 8443).
L7 reverse_proxy is impossible here: CAA pins issuance to Stalwart's
ACME account, so Caddy can't obtain its own cert for these names.
- acl-snippet.hujson: grant tcp:443 on reverse-proxy -> stalwart for the
SNI pass-through.
- config/config.json: track the v0.16 bootstrap (commit-safe; the DB
secret is an EnvironmentVariable reference, not inline).
- LESSONS.md: symptom -> cause -> fix notes (PG race, DNS-01/Spaceship
dead key, auto-ban vs PROXY protocol, wildcard-requires-DNS-01, SNI
pass-through, ephemeral sidecar IP, LE rate-limit checks).
- .gitignore: exclude _backup/ and _validate/ (DB dumps + an inline-secret
config) and editor swap files. NEVER commit those.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 00:03:52 -04:00
|
|
|
# Healthy only when BOTH the tailnet link is up AND Postgres is reachable
|
|
|
|
|
# over it. The stalwart service gates on this (depends_on: service_healthy),
|
|
|
|
|
# so it can no longer start into the race where it tries the DB before the
|
|
|
|
|
# tailnet route exists — which logged "Failed to create tables" and dropped
|
|
|
|
|
# in-flight cert/setting writes (e.g. lost the ACME cert on 2026-06-10).
|
|
|
|
|
test: ["CMD-SHELL", "wget -qO- http://127.0.0.1:9002/healthz && nc -z -w3 ${DB_MAGIC_NAME}.${TS_TAILNET} 5432"]
|
2026-06-03 22:25:38 -04:00
|
|
|
interval: 10s
|
Harden mail edge: PG-race healthcheck gate, :443 SNI fan-out, docs
Fixes the root cause that was silently dropping Stalwart's cert/setting
writes, completes the public HTTPS endpoints, and captures the debugging
knowledge.
- docker-compose.yml: gate the ts-stalwart healthcheck on Postgres
reachability (nc -z the-record-prod:5432) in addition to tailscaled
health. Stalwart's depends_on: service_healthy can no longer release it
into the window where the tailnet route to Postgres isn't up yet — which
was failing table init and losing in-flight cert writes (-> rcgen).
- caddy/caddy.json + README: add the :443 SNI fan-out. mta-sts /
autoconfig / autodiscover pass through to stalwart:443 (Stalwart
terminates TLS with its wildcard cert; no proxy_protocol on :443).
All other SNIs go to the box's web Caddy on :8443 (https_port 8443).
L7 reverse_proxy is impossible here: CAA pins issuance to Stalwart's
ACME account, so Caddy can't obtain its own cert for these names.
- acl-snippet.hujson: grant tcp:443 on reverse-proxy -> stalwart for the
SNI pass-through.
- config/config.json: track the v0.16 bootstrap (commit-safe; the DB
secret is an EnvironmentVariable reference, not inline).
- LESSONS.md: symptom -> cause -> fix notes (PG race, DNS-01/Spaceship
dead key, auto-ban vs PROXY protocol, wildcard-requires-DNS-01, SNI
pass-through, ephemeral sidecar IP, LE rate-limit checks).
- .gitignore: exclude _backup/ and _validate/ (DB dumps + an inline-secret
config) and editor swap files. NEVER commit those.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 00:03:52 -04:00
|
|
|
timeout: 8s
|
2026-06-03 22:25:38 -04:00
|
|
|
retries: 6
|
|
|
|
|
start_period: 30s
|
|
|
|
|
restart: unless-stopped
|
|
|
|
|
|
|
|
|
|
stalwart:
|
2026-06-04 01:21:58 -04:00
|
|
|
image: stalwartlabs/stalwart:v0.16.7
|
2026-06-03 22:25:38 -04:00
|
|
|
network_mode: "service:ts-stalwart"
|
|
|
|
|
environment:
|
|
|
|
|
# Consumed by config/config.toml via its %{env:NAME}% macros. Keeping
|
|
|
|
|
# secrets in env (not the mounted toml) means the toml is commit-safe.
|
|
|
|
|
STALWART_DB_NAME: ${STALWART_DB_NAME}
|
|
|
|
|
STALWART_DB_USER: ${STALWART_DB_USER}
|
|
|
|
|
STALWART_DB_PASSWORD: ${STALWART_DB_PASSWORD}
|
|
|
|
|
DB_HOST: ${DB_MAGIC_NAME}.${TS_TAILNET}
|
|
|
|
|
REDIS_URL: redis://${REDIS_MAGIC_NAME}.${TS_TAILNET}:6379/${STALWART_REDIS_DB}
|
|
|
|
|
S3_ENDPOINT: http://${GARAGE_MAGIC_NAME}.${TS_TAILNET}:3900
|
|
|
|
|
S3_REGION: ${GARAGE_REGION}
|
|
|
|
|
S3_BUCKET: ${STALWART_S3_BUCKET}
|
|
|
|
|
S3_ACCESS_KEY: ${GARAGE_ACCESS_KEY_ID}
|
|
|
|
|
S3_SECRET_KEY: ${GARAGE_SECRET_ACCESS_KEY}
|
|
|
|
|
STALWART_HOSTNAME: ${STALWART_HOSTNAME}
|
|
|
|
|
STALWART_DOMAIN: ${STALWART_DOMAIN}
|
|
|
|
|
STALWART_SMARTHOST: ${STALWART_SMARTHOST}
|
|
|
|
|
STALWART_FALLBACK_ADMIN_SECRET: ${STALWART_FALLBACK_ADMIN_SECRET}
|
|
|
|
|
volumes:
|
stalwart: migrate to v0.16 config model; fix stores, listeners, persistence
v0.16 dropped TOML/%{env}% for a JSON datastore-only config, with all other
settings living in Postgres. This migrates the deployment and fixes the
fallout found during the first real run.
- config/config.json: v0.16 JSON bootstrap (root = PostgreSql datastore;
DB password via the EnvironmentVariable secret type, so it stays
commit-safe). Replaces the now-dead config.toml.
- docker-compose.yml: bind-mount config.json -> /etc/stalwart/config.json
(the image's --config path) and use a named volume for /var/lib/stalwart;
the old anonymous volumes were orphaned on every recreate ("lost settings").
Drop the dead config.toml mount.
- .gitignore: exclude local operational artifacts that hold real secrets +
mail data (_backup/, _validate/, *.dump, export/). config/config.json is
intentionally tracked (secret-free).
- CLAUDE.md: "Lessons learned — v0.16 first real run" — config model, the
anonymous-volume trap, full-FQDN store endpoints, per-listener PROXY trust,
one-instance-per-store, recovery mode + argon2 password reset, ACME, backups.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 23:36:46 -04:00
|
|
|
# Bootstrap config (v0.16 JSON): tells Stalwart only where Postgres lives;
|
|
|
|
|
# all other settings live in the DB. Mounted at the image's default
|
|
|
|
|
# --config path (/etc/stalwart/config.json). Secret comes from the
|
|
|
|
|
# STALWART_DB_PASSWORD env above, referenced via the EnvironmentVariable
|
|
|
|
|
# secret type inside the file — so this stays commit-safe.
|
|
|
|
|
- ./config/config.json:/etc/stalwart/config.json:ro
|
|
|
|
|
# Working dir: ACME cert cache + outbound queue spool. Named volume (not
|
|
|
|
|
# anonymous) so a recreate doesn't orphan it and drop queued mail/certs.
|
|
|
|
|
- stalwart-data:/var/lib/stalwart
|
2026-06-03 22:25:38 -04:00
|
|
|
depends_on:
|
|
|
|
|
ts-stalwart:
|
|
|
|
|
condition: service_healthy
|
|
|
|
|
restart: unless-stopped
|
|
|
|
|
|
|
|
|
|
volumes:
|
|
|
|
|
stalwart-data:
|
2026-06-11 18:53:28 -04:00
|
|
|
|
|
|
|
|
# The sidecar's bridge (shared by stalwart via network_mode) gets IPv6 here so
|
|
|
|
|
# the container can reach AAAA-only / dual-stack hosts. Without it the netns has
|
|
|
|
|
# no global v6 → Stalwart tries AAAA first, gets ENETUNREACH, and for a relay
|
|
|
|
|
# next-hop never falls back to A (see LESSONS.md #8). A ULA subnet is fine: we
|
|
|
|
|
# only need *egress* (inbound arrives via the edge/tailnet, never v6). Docker 29
|
|
|
|
|
# masquerades it out the host's global v6 via ip6tables — no routable prefix,
|
|
|
|
|
# NDP proxy, or host sysctl needed. Recreating this network bounces the stack
|
|
|
|
|
# and the ephemeral sidecar gets a new tailnet IP (MagicDNS handles it).
|
|
|
|
|
networks:
|
|
|
|
|
default:
|
|
|
|
|
enable_ipv6: true
|
|
|
|
|
ipam:
|
|
|
|
|
config:
|
|
|
|
|
- subnet: fd00:7a17:600d::/64
|
|
|
|
|
gateway: fd00:7a17:600d::1
|