stalwart: migrate to v0.16 config model; fix stores, listeners, persistence
v0.16 dropped TOML/%{env}% for a JSON datastore-only config, with all other
settings living in Postgres. This migrates the deployment and fixes the
fallout found during the first real run.
- config/config.json: v0.16 JSON bootstrap (root = PostgreSql datastore;
DB password via the EnvironmentVariable secret type, so it stays
commit-safe). Replaces the now-dead config.toml.
- docker-compose.yml: bind-mount config.json -> /etc/stalwart/config.json
(the image's --config path) and use a named volume for /var/lib/stalwart;
the old anonymous volumes were orphaned on every recreate ("lost settings").
Drop the dead config.toml mount.
- .gitignore: exclude local operational artifacts that hold real secrets +
mail data (_backup/, _validate/, *.dump, export/). config/config.json is
intentionally tracked (secret-free).
- CLAUDE.md: "Lessons learned — v0.16 first real run" — config model, the
anonymous-volume trap, full-FQDN store endpoints, per-listener PROXY trust,
one-instance-per-store, recovery mode + argon2 password reset, ACME, backups.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
1072fea410
commit
e9febd037c
13
.gitignore
vendored
13
.gitignore
vendored
@ -9,3 +9,16 @@ caddy/.env
|
||||
# Built Caddy binary (rebuild from caddy/Dockerfile instead of committing 50MB)
|
||||
caddy/caddy
|
||||
caddy/*.bin
|
||||
|
||||
# Local operational artifacts — DB dumps, store exports, validation runs.
|
||||
# These contain REAL secrets + account/mail data. Never commit.
|
||||
_backup/
|
||||
_validate/
|
||||
*.dump
|
||||
# Stalwart store export/import dirs (stalwart --export/--import)
|
||||
export/
|
||||
*.export
|
||||
|
||||
# NB: config/config.json IS committed on purpose — it's the v0.16 bootstrap
|
||||
# config and is secret-free (DB password comes from $STALWART_DB_PASSWORD via
|
||||
# the EnvironmentVariable secret type). Don't add it here.
|
||||
|
||||
77
CLAUDE.md
77
CLAUDE.md
@ -94,3 +94,80 @@ healthcheck, ephemeral OAuth auth). Don't drift it. Tag: `tag:stalwart`.
|
||||
public surface, and it lives in `caddy/`.
|
||||
- Don't commit `.env` or a built Caddy binary (see `.gitignore`).
|
||||
- Don't break the sidecar netns boundary with bridge networks or host ports.
|
||||
|
||||
## Lessons learned — v0.16 first real run (2026-06)
|
||||
|
||||
The pinned image is `stalwartlabs/stalwart:v0.16.7`, and v0.16 changed the config
|
||||
model enough that most of the toml-era notes above are obsolete. Reality:
|
||||
|
||||
### Config model (supersedes the `.env`/`config.toml`/`%{env}%` notes above)
|
||||
- Config is a single **JSON** file the image reads from `--config
|
||||
/etc/stalwart/config.json`. It describes **only the datastore**. The root
|
||||
object *is* the datastore:
|
||||
```json
|
||||
{ "@type": "PostgreSql", "host": "the-record-prod.tail7b1641.ts.net",
|
||||
"port": 5432, "database": "stalwart", "authUsername": "stalwart",
|
||||
"authSecret": { "@type": "EnvironmentVariable", "variableName": "STALWART_DB_PASSWORD" } }
|
||||
```
|
||||
- **TOML is gone. The `%{env:NAME}%` macro is gone.** Secrets use the
|
||||
`EnvironmentVariable` secret type (field `variableName`); a literal uses the
|
||||
`Value` type (field **`secret`**, not `value`). `config/config.toml` is dead —
|
||||
kept only as historical reference.
|
||||
- **Everything else lives in Postgres** (domains, accounts, listeners, ACME,
|
||||
blob/redis store wiring, proxy trust, DKIM, spam) and is managed via the web
|
||||
UI or the `x:` JMAP objects: `x:DataStore` `x:InMemoryStore` `x:BlobStore`
|
||||
`x:NetworkListener` `x:SystemSettings` `x:Account` `x:AcmeProvider` `x:Action`.
|
||||
All are JMAP `*/get`/`*/set` against `/jmap` with a Bearer token; singletons
|
||||
use `ids:["singleton"]`.
|
||||
|
||||
### Persistence (this was the original "I keep losing settings" bug)
|
||||
- Bind-mount `./config/config.json:/etc/stalwart/config.json`; make
|
||||
`/var/lib/stalwart` a **named** volume. The image VOLUME-declares
|
||||
`/etc/stalwart` + `/var/lib/stalwart`; left unmounted they become **anonymous
|
||||
volumes that get orphaned on every recreate** → config/state vanishes.
|
||||
|
||||
### Store endpoints need a full FQDN + port
|
||||
- Bare MagicDNS names silently fail. `http://garage` → `http://garage.tail7b1641.ts.net:3900`;
|
||||
`redis://slo-time-prod` → `redis://slo-time-prod.tail7b1641.ts.net:6379/3`
|
||||
(keep the `/3` logical-DB index). A wrong blob endpoint also blocks the web-UI
|
||||
install (the SPA unpacks to S3) and all message-body storage.
|
||||
|
||||
### PROXY-protocol trust is PER-LISTENER, never global
|
||||
- Set `overrideProxyTrustedNetworks` (`100.64.0.0/10` + `fd7a:115c:a1e0::/48`)
|
||||
on the L4-fronted **mail** listeners only (25/465/587/143/993). Setting the
|
||||
**global** `proxyTrustedNetworks` makes the `:8080` admin/HTTP listener demand
|
||||
a PROXY header too → direct browser hits get `ERR_CONNECTION_RESET`.
|
||||
- Adding/removing listeners (e.g. 143 IMAP-STARTTLS, 587 submission-STARTTLS,
|
||||
not created by default) needs a **container restart** — a settings reload does
|
||||
not rebind sockets.
|
||||
|
||||
### One data store ⇒ exactly one Stalwart instance
|
||||
- Two instances on the same Postgres/Redis (a stray `docker run`, or
|
||||
ephemeral-IP restart ghosts) cause ACME orders to go **INVALID**, corrupt
|
||||
rate-limit/auto-ban state, and produce restart flapping. Ephemeral sidecar
|
||||
nodes get a **new tailnet IP per restart**, leaving ghost idle Postgres
|
||||
connections from dead incarnations (`pg_stat_activity` distinct `client_addr`
|
||||
= a restart counter). Postgres being healthy ≠ Stalwart healthy.
|
||||
|
||||
### Accounts / recovery
|
||||
- Locked out? Add `STALWART_RECOVERY_MODE=1` + `STALWART_RECOVERY_ADMIN=admin:<pw>`,
|
||||
restart. Serves only `:8080`, pauses MTA/tasks, and **does not wipe** a
|
||||
native-v0.16 DB (the "wipe" warning is only for migrating a v0.15 store). Mint
|
||||
a token, fix the account, then remove both env vars and restart.
|
||||
- Normal web login is **OAuth/PKCE against the directory**; the recovery admin
|
||||
is honoured only in recovery mode/bootstrap. Set a password via `x:Account/set`
|
||||
`credentials` `@type:Password` with a **pre-hashed `$argon2id$…`** secret
|
||||
(plaintext is stored as cleartext and rejected). Verify with **IMAP AUTH over
|
||||
TLS**, not the web flow.
|
||||
|
||||
### ACME
|
||||
- Account registration succeeds even when the challenge can't run — don't be
|
||||
fooled. `dns-01` needs a DNS-provider API token; `http-01` needs the edge to
|
||||
forward `:80` to Stalwart's HTTP listener. `INVALID` authorizations in the
|
||||
store = challenges failing (often the multi-instance race above). Watch LE's
|
||||
5-failed-validations/hour limit; test against staging.
|
||||
|
||||
### Backups
|
||||
- `stalwart --export <dir>` (read-only) dumps the whole store per subspace;
|
||||
`--import` restores. Plus `pg_dump` of the `stalwart` DB. Both land in
|
||||
`_backup/` / `_validate/` — **gitignored** (real secrets + mail data).
|
||||
|
||||
8
config/config.json
Normal file
8
config/config.json
Normal file
@ -0,0 +1,8 @@
|
||||
{
|
||||
"@type": "PostgreSql",
|
||||
"host": "the-record-prod.tail7b1641.ts.net",
|
||||
"port": 5432,
|
||||
"database": "stalwart",
|
||||
"authUsername": "stalwart",
|
||||
"authSecret": { "@type": "EnvironmentVariable", "variableName": "STALWART_DB_PASSWORD" }
|
||||
}
|
||||
@ -59,10 +59,15 @@ services:
|
||||
STALWART_SMARTHOST: ${STALWART_SMARTHOST}
|
||||
STALWART_FALLBACK_ADMIN_SECRET: ${STALWART_FALLBACK_ADMIN_SECRET}
|
||||
volumes:
|
||||
- ./config/config.toml:/opt/stalwart-mail/etc/config.toml:ro
|
||||
# Local working dir only (logs, ACME cache, queue spool). The bulk data
|
||||
# lives in Postgres + Garage, not here — but Stalwart still wants a home.
|
||||
- stalwart-data:/opt/stalwart-mail
|
||||
# Bootstrap config (v0.16 JSON): tells Stalwart only where Postgres lives;
|
||||
# all other settings live in the DB. Mounted at the image's default
|
||||
# --config path (/etc/stalwart/config.json). Secret comes from the
|
||||
# STALWART_DB_PASSWORD env above, referenced via the EnvironmentVariable
|
||||
# secret type inside the file — so this stays commit-safe.
|
||||
- ./config/config.json:/etc/stalwart/config.json:ro
|
||||
# Working dir: ACME cert cache + outbound queue spool. Named volume (not
|
||||
# anonymous) so a recreate doesn't orphan it and drop queued mail/certs.
|
||||
- stalwart-data:/var/lib/stalwart
|
||||
depends_on:
|
||||
ts-stalwart:
|
||||
condition: service_healthy
|
||||
|
||||
Loading…
Reference in New Issue
Block a user