tailwart/CLAUDE.md
Wayne Hayes e9febd037c stalwart: migrate to v0.16 config model; fix stores, listeners, persistence
v0.16 dropped TOML/%{env}% for a JSON datastore-only config, with all other
settings living in Postgres. This migrates the deployment and fixes the
fallout found during the first real run.

- config/config.json: v0.16 JSON bootstrap (root = PostgreSql datastore;
  DB password via the EnvironmentVariable secret type, so it stays
  commit-safe). Replaces the now-dead config.toml.
- docker-compose.yml: bind-mount config.json -> /etc/stalwart/config.json
  (the image's --config path) and use a named volume for /var/lib/stalwart;
  the old anonymous volumes were orphaned on every recreate ("lost settings").
  Drop the dead config.toml mount.
- .gitignore: exclude local operational artifacts that hold real secrets +
  mail data (_backup/, _validate/, *.dump, export/). config/config.json is
  intentionally tracked (secret-free).
- CLAUDE.md: "Lessons learned — v0.16 first real run" — config model, the
  anonymous-volume trap, full-FQDN store endpoints, per-listener PROXY trust,
  one-instance-per-store, recovery mode + argon2 password reset, ACME, backups.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 23:36:46 -04:00

9.6 KiB

CLAUDE.md — tailwart

Guidance for Claude Code in this repo. Read before editing.

What this is

A play deployment of Stalwart (all-in-one mail/JMAP/IMAP/SMTP server) wired, gratuitously, into three shared backends — Postgres, Redis, and Garage S3 — to see how far the federatedSocial Tailscale-sidecar pattern stretches past the fediverse apps. Target domain: infinidim.net (may become real later).

It is self-contained and outside /opt/federatedSocial on purpose: that's an upstream clone that git pull overwrites. tailwart owns its own .env, compose, config, ACL snippet, and Caddy build, and only reads from the tailnet (shared infra over MagicDNS) at runtime.

Architecture — two ends of one wire

  public IP host (tag:reverse-proxy)            tailnet-only mailbox
  ┌───────────────────────────┐                ┌────────────────────────┐
  │ caddy/  (caddy-l4)         │   tailnet      │ ts-stalwart sidecar     │
  │  :25 :465 :587 :143 :993 ──┼───WireGuard───▶│  stalwart (no WAN, no   │
  │  PROXY protocol v2         │                │  host ports)            │
  └───────────────────────────┘                └───────────┬────────────┘
        L7 JMAP vhost on the main Caddy                     │
        mail.infinidim.net → :8080                  ┌───────┴───────┐
                                                    ▼       ▼       ▼
                                              Postgres   Redis   Garage S3
                                            (the-record)(slo-time)(garage)
  • Mailbox (docker-compose.yml): Stalwart in a Tailscale sidecar via network_mode: service:ts-stalwart. Binds nothing on the host. All mail ports listen on the tailnet only.
  • Edge (caddy/): a layer-4 TCP proxy (Caddy + caddy-l4, pulled prebuilt from caddyserver.com — no local xcaddy build, per ~/docs/caddy.md). Pure pass-through; Stalwart owns TLS. Can run on a different machine than the mailbox — the key idea.
  • Backends: data+fts → Postgres, blob → Garage S3, lookup/in-memory → Redis. One stalwart role/db, one Garage bucket, one Redis logical DB.

The .env contract

.env (gitignored) is the whole operator surface; .env.example is the template. Both compose files read it. Secrets reach Stalwart as env vars and are referenced from config/config.toml via %{env:NAME}% so the toml stays commit-safe. Never hardcode a value that belongs in .env — except the two spots a static file forces it: caddy/caddy.json dial targets and any MagicDNS host in the toml.

Sidecar boilerplate

Identical to federatedSocial's (TS_ACCEPT_DNS true, kernel networking, 127.0.0.1 healthcheck, ephemeral OAuth auth). Don't drift it. Tag: tag:stalwart.

Prerequisites (shared tailnet infra — already running for the fediverse)

  1. Postgres role + db: stalwart / STALWART_DB_NAME. Create via the federatedSocial bootstrap.sh flow or a one-off CREATE ROLE … LOGIN; CREATE DATABASE … OWNER ….
  2. Garage bucket stalwart-mail + grant the shared access key access to it.
  3. Redis: nothing to create — just use a dedicated logical DB index (STALWART_REDIS_DB) so we don't collide with the apps.
  4. Admin console: assign tag:stalwart to the OAuth client (Devices/Core + Keys/AuthKeys) and add acl-snippet.hujson to the policy.

Pitfalls (some learned the hard way next door)

  • Mail edge is layer 4, not layer 7. Don't try to give the L4 ports a normal Caddy vhost. SNI/Host routing doesn't apply to :25.
  • PROXY protocol or your mail reputation dies. Without it Stalwart sees the proxy's tailnet IP as every client → SPF/DNSBL/greylisting break. Both ends must agree (caddy.json proxy_protocol: v2 ↔ config [server.proxy] trusted-networks).
  • Stalwart config drifts between versions and migrates into the admin store after first boot. config/config.toml is a strawman — verify keys against the pinned image tag before trusting them. Pin the tag once it works.
  • POSTGRES_PASSWORD/role passwords only apply on an empty volume. If a password "doesn't work," the stored credential drifted — ALTER USER, don't re-init. And never test a password over 127.0.0.1 against these Postgres containers: pg_hba trusts loopback and accepts ANY password. Test over the tailnet (scram) or you'll fool yourself.
  • Outbound :25 is usually blocked on VPS. Set STALWART_SMARTHOST.
  • Mail forces WAN ports. :25 must be world-reachable for inbound federation — this is the one place the tailnet-only model can't hold. Keep submission/IMAP tailnet-only if you want a tighter surface.

What not to do

  • Don't put files in /opt/federatedSocial. Read its .env if you must; never write there.
  • Don't add ports: to the Stalwart container — the edge proxy is the only public surface, and it lives in caddy/.
  • Don't commit .env or a built Caddy binary (see .gitignore).
  • Don't break the sidecar netns boundary with bridge networks or host ports.

Lessons learned — v0.16 first real run (2026-06)

The pinned image is stalwartlabs/stalwart:v0.16.7, and v0.16 changed the config model enough that most of the toml-era notes above are obsolete. Reality:

Config model (supersedes the .env/config.toml/%{env}% notes above)

  • Config is a single JSON file the image reads from --config /etc/stalwart/config.json. It describes only the datastore. The root object is the datastore:
    { "@type": "PostgreSql", "host": "the-record-prod.tail7b1641.ts.net",
      "port": 5432, "database": "stalwart", "authUsername": "stalwart",
      "authSecret": { "@type": "EnvironmentVariable", "variableName": "STALWART_DB_PASSWORD" } }
    
  • TOML is gone. The %{env:NAME}% macro is gone. Secrets use the EnvironmentVariable secret type (field variableName); a literal uses the Value type (field secret, not value). config/config.toml is dead — kept only as historical reference.
  • Everything else lives in Postgres (domains, accounts, listeners, ACME, blob/redis store wiring, proxy trust, DKIM, spam) and is managed via the web UI or the x: JMAP objects: x:DataStore x:InMemoryStore x:BlobStore x:NetworkListener x:SystemSettings x:Account x:AcmeProvider x:Action. All are JMAP */get/*/set against /jmap with a Bearer token; singletons use ids:["singleton"].

Persistence (this was the original "I keep losing settings" bug)

  • Bind-mount ./config/config.json:/etc/stalwart/config.json; make /var/lib/stalwart a named volume. The image VOLUME-declares /etc/stalwart + /var/lib/stalwart; left unmounted they become anonymous volumes that get orphaned on every recreate → config/state vanishes.

Store endpoints need a full FQDN + port

  • Bare MagicDNS names silently fail. http://garagehttp://garage.tail7b1641.ts.net:3900; redis://slo-time-prodredis://slo-time-prod.tail7b1641.ts.net:6379/3 (keep the /3 logical-DB index). A wrong blob endpoint also blocks the web-UI install (the SPA unpacks to S3) and all message-body storage.

PROXY-protocol trust is PER-LISTENER, never global

  • Set overrideProxyTrustedNetworks (100.64.0.0/10 + fd7a:115c:a1e0::/48) on the L4-fronted mail listeners only (25/465/587/143/993). Setting the global proxyTrustedNetworks makes the :8080 admin/HTTP listener demand a PROXY header too → direct browser hits get ERR_CONNECTION_RESET.
  • Adding/removing listeners (e.g. 143 IMAP-STARTTLS, 587 submission-STARTTLS, not created by default) needs a container restart — a settings reload does not rebind sockets.

One data store ⇒ exactly one Stalwart instance

  • Two instances on the same Postgres/Redis (a stray docker run, or ephemeral-IP restart ghosts) cause ACME orders to go INVALID, corrupt rate-limit/auto-ban state, and produce restart flapping. Ephemeral sidecar nodes get a new tailnet IP per restart, leaving ghost idle Postgres connections from dead incarnations (pg_stat_activity distinct client_addr = a restart counter). Postgres being healthy ≠ Stalwart healthy.

Accounts / recovery

  • Locked out? Add STALWART_RECOVERY_MODE=1 + STALWART_RECOVERY_ADMIN=admin:<pw>, restart. Serves only :8080, pauses MTA/tasks, and does not wipe a native-v0.16 DB (the "wipe" warning is only for migrating a v0.15 store). Mint a token, fix the account, then remove both env vars and restart.
  • Normal web login is OAuth/PKCE against the directory; the recovery admin is honoured only in recovery mode/bootstrap. Set a password via x:Account/set credentials @type:Password with a pre-hashed $argon2id$… secret (plaintext is stored as cleartext and rejected). Verify with IMAP AUTH over TLS, not the web flow.

ACME

  • Account registration succeeds even when the challenge can't run — don't be fooled. dns-01 needs a DNS-provider API token; http-01 needs the edge to forward :80 to Stalwart's HTTP listener. INVALID authorizations in the store = challenges failing (often the multi-instance race above). Watch LE's 5-failed-validations/hour limit; test against staging.

Backups

  • stalwart --export <dir> (read-only) dumps the whole store per subspace; --import restores. Plus pg_dump of the stalwart DB. Both land in _backup/ / _validate/gitignored (real secrets + mail data).