2026-06-03 22:25:38 -04:00
|
|
|
# CLAUDE.md — tailwart
|
|
|
|
|
|
|
|
|
|
Guidance for Claude Code in this repo. Read before editing.
|
|
|
|
|
|
|
|
|
|
## What this is
|
|
|
|
|
|
|
|
|
|
A play deployment of **Stalwart** (all-in-one mail/JMAP/IMAP/SMTP server) wired,
|
|
|
|
|
gratuitously, into **three** shared backends — Postgres, Redis, and Garage S3 —
|
|
|
|
|
to see how far the federatedSocial Tailscale-sidecar pattern stretches past the
|
|
|
|
|
fediverse apps. Target domain: `infinidim.net` (may become real later).
|
|
|
|
|
|
|
|
|
|
It is **self-contained and outside** `/opt/federatedSocial` on purpose: that's
|
|
|
|
|
an upstream clone that `git pull` overwrites. tailwart owns its own `.env`,
|
|
|
|
|
compose, config, ACL snippet, and Caddy build, and only *reads from the tailnet*
|
|
|
|
|
(shared infra over MagicDNS) at runtime.
|
|
|
|
|
|
|
|
|
|
## Architecture — two ends of one wire
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
public IP host (tag:reverse-proxy) tailnet-only mailbox
|
|
|
|
|
┌───────────────────────────┐ ┌────────────────────────┐
|
|
|
|
|
│ caddy/ (caddy-l4) │ tailnet │ ts-stalwart sidecar │
|
|
|
|
|
│ :25 :465 :587 :143 :993 ──┼───WireGuard───▶│ stalwart (no WAN, no │
|
|
|
|
|
│ PROXY protocol v2 │ │ host ports) │
|
|
|
|
|
└───────────────────────────┘ └───────────┬────────────┘
|
|
|
|
|
L7 JMAP vhost on the main Caddy │
|
|
|
|
|
mail.infinidim.net → :8080 ┌───────┴───────┐
|
|
|
|
|
▼ ▼ ▼
|
|
|
|
|
Postgres Redis Garage S3
|
|
|
|
|
(the-record)(slo-time)(garage)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
- **Mailbox** (`docker-compose.yml`): Stalwart in a Tailscale sidecar via
|
|
|
|
|
`network_mode: service:ts-stalwart`. Binds nothing on the host. All mail
|
|
|
|
|
ports listen on the tailnet only.
|
2026-06-03 22:39:33 -04:00
|
|
|
- **Edge** (`caddy/`): a layer-4 TCP proxy (Caddy + `caddy-l4`, pulled prebuilt
|
|
|
|
|
from caddyserver.com — no local `xcaddy` build, per `~/docs/caddy.md`). Pure
|
|
|
|
|
pass-through; Stalwart owns TLS. **Can run on a different machine** than the
|
|
|
|
|
mailbox — the key idea.
|
2026-06-03 22:25:38 -04:00
|
|
|
- **Backends**: data+fts → Postgres, blob → Garage S3, lookup/in-memory →
|
|
|
|
|
Redis. One stalwart role/db, one Garage bucket, one Redis logical DB.
|
|
|
|
|
|
|
|
|
|
## The `.env` contract
|
|
|
|
|
|
|
|
|
|
`.env` (gitignored) is the whole operator surface; `.env.example` is the
|
|
|
|
|
template. Both compose files read it. Secrets reach Stalwart as env vars and are
|
|
|
|
|
referenced from `config/config.toml` via `%{env:NAME}%` so the toml stays
|
|
|
|
|
commit-safe. Never hardcode a value that belongs in `.env` — except the two
|
|
|
|
|
spots a static file forces it: `caddy/caddy.json` dial targets and any
|
|
|
|
|
MagicDNS host in the toml.
|
|
|
|
|
|
|
|
|
|
## Sidecar boilerplate
|
|
|
|
|
|
|
|
|
|
Identical to federatedSocial's (TS_ACCEPT_DNS true, kernel networking, 127.0.0.1
|
|
|
|
|
healthcheck, ephemeral OAuth auth). Don't drift it. Tag: `tag:stalwart`.
|
|
|
|
|
|
|
|
|
|
## Prerequisites (shared tailnet infra — already running for the fediverse)
|
|
|
|
|
|
|
|
|
|
1. Postgres role + db: `stalwart` / `STALWART_DB_NAME`. Create via the
|
|
|
|
|
federatedSocial `bootstrap.sh` flow or a one-off `CREATE ROLE … LOGIN; CREATE
|
|
|
|
|
DATABASE … OWNER …`.
|
|
|
|
|
2. Garage bucket `stalwart-mail` + grant the shared access key access to it.
|
|
|
|
|
3. Redis: nothing to create — just use a dedicated logical DB index
|
|
|
|
|
(`STALWART_REDIS_DB`) so we don't collide with the apps.
|
|
|
|
|
4. Admin console: assign `tag:stalwart` to the OAuth client (Devices/Core +
|
|
|
|
|
Keys/AuthKeys) and add `acl-snippet.hujson` to the policy.
|
|
|
|
|
|
|
|
|
|
## Pitfalls (some learned the hard way next door)
|
|
|
|
|
|
|
|
|
|
- **Mail edge is layer 4, not layer 7.** Don't try to give the L4 ports a
|
|
|
|
|
normal Caddy vhost. SNI/Host routing doesn't apply to `:25`.
|
|
|
|
|
- **PROXY protocol or your mail reputation dies.** Without it Stalwart sees the
|
|
|
|
|
proxy's tailnet IP as every client → SPF/DNSBL/greylisting break. Both ends
|
|
|
|
|
must agree (caddy.json `proxy_protocol: v2` ↔ config `[server.proxy]
|
|
|
|
|
trusted-networks`).
|
|
|
|
|
- **Stalwart config drifts between versions and migrates into the admin store
|
|
|
|
|
after first boot.** `config/config.toml` is a strawman — verify keys against
|
|
|
|
|
the pinned image tag before trusting them. Pin the tag once it works.
|
|
|
|
|
- **`POSTGRES_PASSWORD`/role passwords only apply on an empty volume.** If a
|
|
|
|
|
password "doesn't work," the stored credential drifted — `ALTER USER`, don't
|
|
|
|
|
re-init. And never test a password over `127.0.0.1` against these Postgres
|
|
|
|
|
containers: pg_hba `trust`s loopback and accepts ANY password. Test over the
|
|
|
|
|
tailnet (scram) or you'll fool yourself.
|
|
|
|
|
- **Outbound :25 is usually blocked on VPS.** Set `STALWART_SMARTHOST`.
|
|
|
|
|
- **Mail forces WAN ports.** `:25` must be world-reachable for inbound
|
|
|
|
|
federation — this is the one place the tailnet-only model can't hold. Keep
|
|
|
|
|
submission/IMAP tailnet-only if you want a tighter surface.
|
|
|
|
|
|
|
|
|
|
## What not to do
|
|
|
|
|
|
|
|
|
|
- Don't put files in `/opt/federatedSocial`. Read its `.env` if you must; never
|
|
|
|
|
write there.
|
|
|
|
|
- Don't add `ports:` to the Stalwart container — the edge proxy is the only
|
|
|
|
|
public surface, and it lives in `caddy/`.
|
|
|
|
|
- Don't commit `.env` or a built Caddy binary (see `.gitignore`).
|
|
|
|
|
- Don't break the sidecar netns boundary with bridge networks or host ports.
|
stalwart: migrate to v0.16 config model; fix stores, listeners, persistence
v0.16 dropped TOML/%{env}% for a JSON datastore-only config, with all other
settings living in Postgres. This migrates the deployment and fixes the
fallout found during the first real run.
- config/config.json: v0.16 JSON bootstrap (root = PostgreSql datastore;
DB password via the EnvironmentVariable secret type, so it stays
commit-safe). Replaces the now-dead config.toml.
- docker-compose.yml: bind-mount config.json -> /etc/stalwart/config.json
(the image's --config path) and use a named volume for /var/lib/stalwart;
the old anonymous volumes were orphaned on every recreate ("lost settings").
Drop the dead config.toml mount.
- .gitignore: exclude local operational artifacts that hold real secrets +
mail data (_backup/, _validate/, *.dump, export/). config/config.json is
intentionally tracked (secret-free).
- CLAUDE.md: "Lessons learned — v0.16 first real run" — config model, the
anonymous-volume trap, full-FQDN store endpoints, per-listener PROXY trust,
one-instance-per-store, recovery mode + argon2 password reset, ACME, backups.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 23:36:46 -04:00
|
|
|
|
|
|
|
|
## Lessons learned — v0.16 first real run (2026-06)
|
|
|
|
|
|
|
|
|
|
The pinned image is `stalwartlabs/stalwart:v0.16.7`, and v0.16 changed the config
|
|
|
|
|
model enough that most of the toml-era notes above are obsolete. Reality:
|
|
|
|
|
|
|
|
|
|
### Config model (supersedes the `.env`/`config.toml`/`%{env}%` notes above)
|
|
|
|
|
- Config is a single **JSON** file the image reads from `--config
|
|
|
|
|
/etc/stalwart/config.json`. It describes **only the datastore**. The root
|
|
|
|
|
object *is* the datastore:
|
|
|
|
|
```json
|
|
|
|
|
{ "@type": "PostgreSql", "host": "the-record-prod.tail7b1641.ts.net",
|
|
|
|
|
"port": 5432, "database": "stalwart", "authUsername": "stalwart",
|
|
|
|
|
"authSecret": { "@type": "EnvironmentVariable", "variableName": "STALWART_DB_PASSWORD" } }
|
|
|
|
|
```
|
|
|
|
|
- **TOML is gone. The `%{env:NAME}%` macro is gone.** Secrets use the
|
|
|
|
|
`EnvironmentVariable` secret type (field `variableName`); a literal uses the
|
|
|
|
|
`Value` type (field **`secret`**, not `value`). `config/config.toml` is dead —
|
|
|
|
|
kept only as historical reference.
|
|
|
|
|
- **Everything else lives in Postgres** (domains, accounts, listeners, ACME,
|
|
|
|
|
blob/redis store wiring, proxy trust, DKIM, spam) and is managed via the web
|
|
|
|
|
UI or the `x:` JMAP objects: `x:DataStore` `x:InMemoryStore` `x:BlobStore`
|
|
|
|
|
`x:NetworkListener` `x:SystemSettings` `x:Account` `x:AcmeProvider` `x:Action`.
|
|
|
|
|
All are JMAP `*/get`/`*/set` against `/jmap` with a Bearer token; singletons
|
|
|
|
|
use `ids:["singleton"]`.
|
|
|
|
|
|
|
|
|
|
### Persistence (this was the original "I keep losing settings" bug)
|
|
|
|
|
- Bind-mount `./config/config.json:/etc/stalwart/config.json`; make
|
|
|
|
|
`/var/lib/stalwart` a **named** volume. The image VOLUME-declares
|
|
|
|
|
`/etc/stalwart` + `/var/lib/stalwart`; left unmounted they become **anonymous
|
|
|
|
|
volumes that get orphaned on every recreate** → config/state vanishes.
|
|
|
|
|
|
|
|
|
|
### Store endpoints need a full FQDN + port
|
|
|
|
|
- Bare MagicDNS names silently fail. `http://garage` → `http://garage.tail7b1641.ts.net:3900`;
|
|
|
|
|
`redis://slo-time-prod` → `redis://slo-time-prod.tail7b1641.ts.net:6379/3`
|
|
|
|
|
(keep the `/3` logical-DB index). A wrong blob endpoint also blocks the web-UI
|
|
|
|
|
install (the SPA unpacks to S3) and all message-body storage.
|
|
|
|
|
|
|
|
|
|
### PROXY-protocol trust is PER-LISTENER, never global
|
|
|
|
|
- Set `overrideProxyTrustedNetworks` (`100.64.0.0/10` + `fd7a:115c:a1e0::/48`)
|
|
|
|
|
on the L4-fronted **mail** listeners only (25/465/587/143/993). Setting the
|
|
|
|
|
**global** `proxyTrustedNetworks` makes the `:8080` admin/HTTP listener demand
|
|
|
|
|
a PROXY header too → direct browser hits get `ERR_CONNECTION_RESET`.
|
|
|
|
|
- Adding/removing listeners (e.g. 143 IMAP-STARTTLS, 587 submission-STARTTLS,
|
|
|
|
|
not created by default) needs a **container restart** — a settings reload does
|
|
|
|
|
not rebind sockets.
|
|
|
|
|
|
|
|
|
|
### One data store ⇒ exactly one Stalwart instance
|
|
|
|
|
- Two instances on the same Postgres/Redis (a stray `docker run`, or
|
|
|
|
|
ephemeral-IP restart ghosts) cause ACME orders to go **INVALID**, corrupt
|
|
|
|
|
rate-limit/auto-ban state, and produce restart flapping. Ephemeral sidecar
|
|
|
|
|
nodes get a **new tailnet IP per restart**, leaving ghost idle Postgres
|
|
|
|
|
connections from dead incarnations (`pg_stat_activity` distinct `client_addr`
|
|
|
|
|
= a restart counter). Postgres being healthy ≠ Stalwart healthy.
|
|
|
|
|
|
|
|
|
|
### Accounts / recovery
|
|
|
|
|
- Locked out? Add `STALWART_RECOVERY_MODE=1` + `STALWART_RECOVERY_ADMIN=admin:<pw>`,
|
|
|
|
|
restart. Serves only `:8080`, pauses MTA/tasks, and **does not wipe** a
|
|
|
|
|
native-v0.16 DB (the "wipe" warning is only for migrating a v0.15 store). Mint
|
|
|
|
|
a token, fix the account, then remove both env vars and restart.
|
|
|
|
|
- Normal web login is **OAuth/PKCE against the directory**; the recovery admin
|
|
|
|
|
is honoured only in recovery mode/bootstrap. Set a password via `x:Account/set`
|
|
|
|
|
`credentials` `@type:Password` with a **pre-hashed `$argon2id$…`** secret
|
|
|
|
|
(plaintext is stored as cleartext and rejected). Verify with **IMAP AUTH over
|
|
|
|
|
TLS**, not the web flow.
|
|
|
|
|
|
|
|
|
|
### ACME
|
|
|
|
|
- Account registration succeeds even when the challenge can't run — don't be
|
|
|
|
|
fooled. `dns-01` needs a DNS-provider API token; `http-01` needs the edge to
|
|
|
|
|
forward `:80` to Stalwart's HTTP listener. `INVALID` authorizations in the
|
|
|
|
|
store = challenges failing (often the multi-instance race above). Watch LE's
|
|
|
|
|
5-failed-validations/hour limit; test against staging.
|
|
|
|
|
|
|
|
|
|
### Backups
|
|
|
|
|
- `stalwart --export <dir>` (read-only) dumps the whole store per subspace;
|
|
|
|
|
`--import` restores. Plus `pg_dump` of the `stalwart` DB. Both land in
|
|
|
|
|
`_backup/` / `_validate/` — **gitignored** (real secrets + mail data).
|