LESSONS.md gains 8-12: container has no IPv6 (AAAA fails before A, no fallback), host IPv6 != container IPv6, VPS blocks all outbound SMTP ports (relay over tailnet), sidecar needs a source ACL grant to initiate, and MtaRoute changes only take effect on restart. CLAUDE.md and .env.example warn that the smarthost address must be an IPv4 literal or tailnet IP, never a dual-stack hostname. acl-snippet adds the tag:stalwart -> tag:mail outbound grant. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
9.9 KiB
CLAUDE.md — tailwart
Guidance for Claude Code in this repo. Read before editing.
What this is
A play deployment of Stalwart (all-in-one mail/JMAP/IMAP/SMTP server) wired,
gratuitously, into three shared backends — Postgres, Redis, and Garage S3 —
to see how far the federatedSocial Tailscale-sidecar pattern stretches past the
fediverse apps. Target domain: infinidim.net (may become real later).
It is self-contained and outside /opt/federatedSocial on purpose: that's
an upstream clone that git pull overwrites. tailwart owns its own .env,
compose, config, ACL snippet, and Caddy build, and only reads from the tailnet
(shared infra over MagicDNS) at runtime.
Architecture — two ends of one wire
public IP host (tag:reverse-proxy) tailnet-only mailbox
┌───────────────────────────┐ ┌────────────────────────┐
│ caddy/ (caddy-l4) │ tailnet │ ts-stalwart sidecar │
│ :25 :465 :587 :143 :993 ──┼───WireGuard───▶│ stalwart (no WAN, no │
│ PROXY protocol v2 │ │ host ports) │
└───────────────────────────┘ └───────────┬────────────┘
L7 JMAP vhost on the main Caddy │
mail.infinidim.net → :8080 ┌───────┴───────┐
▼ ▼ ▼
Postgres Redis Garage S3
(the-record)(slo-time)(garage)
- Mailbox (
docker-compose.yml): Stalwart in a Tailscale sidecar vianetwork_mode: service:ts-stalwart. Binds nothing on the host. All mail ports listen on the tailnet only. - Edge (
caddy/): a layer-4 TCP proxy (Caddy +caddy-l4, pulled prebuilt from caddyserver.com — no localxcaddybuild, per~/docs/caddy.md). Pure pass-through; Stalwart owns TLS. Can run on a different machine than the mailbox — the key idea. - Backends: data+fts → Postgres, blob → Garage S3, lookup/in-memory → Redis. One stalwart role/db, one Garage bucket, one Redis logical DB.
The .env contract
.env (gitignored) is the whole operator surface; .env.example is the
template. Both compose files read it. Secrets reach Stalwart as env vars and are
referenced from config/config.toml via %{env:NAME}% so the toml stays
commit-safe. Never hardcode a value that belongs in .env — except the two
spots a static file forces it: caddy/caddy.json dial targets and any
MagicDNS host in the toml.
Sidecar boilerplate
Identical to federatedSocial's (TS_ACCEPT_DNS true, kernel networking, 127.0.0.1
healthcheck, ephemeral OAuth auth). Don't drift it. Tag: tag:stalwart.
Prerequisites (shared tailnet infra — already running for the fediverse)
- Postgres role + db:
stalwart/STALWART_DB_NAME. Create via the federatedSocialbootstrap.shflow or a one-offCREATE ROLE … LOGIN; CREATE DATABASE … OWNER …. - Garage bucket
stalwart-mail+ grant the shared access key access to it. - Redis: nothing to create — just use a dedicated logical DB index
(
STALWART_REDIS_DB) so we don't collide with the apps. - Admin console: assign
tag:stalwartto the OAuth client (Devices/Core + Keys/AuthKeys) and addacl-snippet.hujsonto the policy.
Pitfalls (some learned the hard way next door)
- Mail edge is layer 4, not layer 7. Don't try to give the L4 ports a
normal Caddy vhost. SNI/Host routing doesn't apply to
:25. - PROXY protocol or your mail reputation dies. Without it Stalwart sees the
proxy's tailnet IP as every client → SPF/DNSBL/greylisting break. Both ends
must agree (caddy.json
proxy_protocol: v2↔ config[server.proxy] trusted-networks). - Stalwart config drifts between versions and migrates into the admin store
after first boot.
config/config.tomlis a strawman — verify keys against the pinned image tag before trusting them. Pin the tag once it works. POSTGRES_PASSWORD/role passwords only apply on an empty volume. If a password "doesn't work," the stored credential drifted —ALTER USER, don't re-init. And never test a password over127.0.0.1against these Postgres containers: pg_hbatrusts loopback and accepts ANY password. Test over the tailnet (scram) or you'll fool yourself.- Outbound :25 is usually blocked on VPS. Set
STALWART_SMARTHOST. The relay address must be an IPv4 literal or a tailnet IP — never a dual-stack hostname. The container has no IPv6 and will not fall back from AAAA to A; relaying over the tailnet (100.x:587) also bypasses all VPS SMTP port blocks. - Mail forces WAN ports.
:25must be world-reachable for inbound federation — this is the one place the tailnet-only model can't hold. Keep submission/IMAP tailnet-only if you want a tighter surface.
What not to do
- Don't put files in
/opt/federatedSocial. Read its.envif you must; never write there. - Don't add
ports:to the Stalwart container — the edge proxy is the only public surface, and it lives incaddy/. - Don't commit
.envor a built Caddy binary (see.gitignore). - Don't break the sidecar netns boundary with bridge networks or host ports.
Lessons learned — v0.16 first real run (2026-06)
The pinned image is stalwartlabs/stalwart:v0.16.7, and v0.16 changed the config
model enough that most of the toml-era notes above are obsolete. Reality:
Config model (supersedes the .env/config.toml/%{env}% notes above)
- Config is a single JSON file the image reads from
--config /etc/stalwart/config.json. It describes only the datastore. The root object is the datastore:{ "@type": "PostgreSql", "host": "the-record-prod.tail7b1641.ts.net", "port": 5432, "database": "stalwart", "authUsername": "stalwart", "authSecret": { "@type": "EnvironmentVariable", "variableName": "STALWART_DB_PASSWORD" } } - TOML is gone. The
%{env:NAME}%macro is gone. Secrets use theEnvironmentVariablesecret type (fieldvariableName); a literal uses theValuetype (fieldsecret, notvalue).config/config.tomlis dead — kept only as historical reference. - Everything else lives in Postgres (domains, accounts, listeners, ACME,
blob/redis store wiring, proxy trust, DKIM, spam) and is managed via the web
UI or the
x:JMAP objects:x:DataStorex:InMemoryStorex:BlobStorex:NetworkListenerx:SystemSettingsx:Accountx:AcmeProviderx:Action. All are JMAP*/get/*/setagainst/jmapwith a Bearer token; singletons useids:["singleton"].
Persistence (this was the original "I keep losing settings" bug)
- Bind-mount
./config/config.json:/etc/stalwart/config.json; make/var/lib/stalwarta named volume. The image VOLUME-declares/etc/stalwart+/var/lib/stalwart; left unmounted they become anonymous volumes that get orphaned on every recreate → config/state vanishes.
Store endpoints need a full FQDN + port
- Bare MagicDNS names silently fail.
http://garage→http://garage.tail7b1641.ts.net:3900;redis://slo-time-prod→redis://slo-time-prod.tail7b1641.ts.net:6379/3(keep the/3logical-DB index). A wrong blob endpoint also blocks the web-UI install (the SPA unpacks to S3) and all message-body storage.
PROXY-protocol trust is PER-LISTENER, never global
- Set
overrideProxyTrustedNetworks(100.64.0.0/10+fd7a:115c:a1e0::/48) on the L4-fronted mail listeners only (25/465/587/143/993). Setting the globalproxyTrustedNetworksmakes the:8080admin/HTTP listener demand a PROXY header too → direct browser hits getERR_CONNECTION_RESET. - Adding/removing listeners (e.g. 143 IMAP-STARTTLS, 587 submission-STARTTLS, not created by default) needs a container restart — a settings reload does not rebind sockets.
One data store ⇒ exactly one Stalwart instance
- Two instances on the same Postgres/Redis (a stray
docker run, or ephemeral-IP restart ghosts) cause ACME orders to go INVALID, corrupt rate-limit/auto-ban state, and produce restart flapping. Ephemeral sidecar nodes get a new tailnet IP per restart, leaving ghost idle Postgres connections from dead incarnations (pg_stat_activitydistinctclient_addr= a restart counter). Postgres being healthy ≠ Stalwart healthy.
Accounts / recovery
- Locked out? Add
STALWART_RECOVERY_MODE=1+STALWART_RECOVERY_ADMIN=admin:<pw>, restart. Serves only:8080, pauses MTA/tasks, and does not wipe a native-v0.16 DB (the "wipe" warning is only for migrating a v0.15 store). Mint a token, fix the account, then remove both env vars and restart. - Normal web login is OAuth/PKCE against the directory; the recovery admin
is honoured only in recovery mode/bootstrap. Set a password via
x:Account/setcredentials@type:Passwordwith a pre-hashed$argon2id$…secret (plaintext is stored as cleartext and rejected). Verify with IMAP AUTH over TLS, not the web flow.
ACME
- Account registration succeeds even when the challenge can't run — don't be
fooled.
dns-01needs a DNS-provider API token;http-01needs the edge to forward:80to Stalwart's HTTP listener.INVALIDauthorizations in the store = challenges failing (often the multi-instance race above). Watch LE's 5-failed-validations/hour limit; test against staging.
Backups
stalwart --export <dir>(read-only) dumps the whole store per subspace;--importrestores. Pluspg_dumpof thestalwartDB. Both land in_backup//_validate/— gitignored (real secrets + mail data).