Compare commits

..

5 Commits

Author SHA1 Message Date
Wayne Hayes
238cce506d feat: provision-stalwart.sh — configure Stalwart fully from .env
Stalwart v0.16 keeps all config in Postgres, reachable via the x: JMAP
management objects. This script writes everything the setup wizard would —
stores (Garage S3 + Redis), listeners (with per-listener PROXY trust on the
mail ports), the primary domain (+auto DKIM), admin + relay/catch-all
accounts, TLS/DNS, and optional Authelia SSO — straight into Postgres over
HTTP Basic. Idempotent (query-before-create), so re-runs are safe.

Tiers (the DNS/TLS automation boundary):
  * Tier 1 (default, trustless): manual DNS, prints the records to publish.
  * Tier 2 (STALWART_DNS_API_KEY set): Stalwart auto-publishes DNS + ACME
    DNS-01 via the provider (Spaceship wired).

Authelia SSO is opt-in (STALWART_SSO_ENABLE); admin + relay keep password
auth as break-glass so enabling SSO can never lock you out.

.env.example: documents the tiered DNS + SSO surface (core reuses existing
fields; only tier-2 needs DNS provider keys). README: quickstart step + layout.

Validated: bash -n; all JMAP payloads build valid JSON; read/idempotency
paths against a live instance. NOT yet validated on a fresh boot (fallback
admin -> create -> re-auth) or the OIDC login round-trip — verify on a
throwaway deploy before relying on those paths.

Shaped to drop into federatedSocial bootstrap.sh as cmd_provision_stalwart.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 23:35:15 -04:00
Wayne Hayes
1e5fc982eb LESSONS: shared-infra readiness (#16 boot-order) + flapping consumer (#17 atuin)
#16 Stalwart-before-Garage on reboot → S3-backed admin SPA 404'd (not a boot
loop). Gate every app on backend *liveness* (depends_on service_healthy +
probe PG/Redis/Garage over the tailnet), don't assume shared infra boots first.

#17 atuin crash-looped 6318x (exit 1) and looked like a Postgres problem;
Postgres was healthy and atuin never even connected. PG health != consumer
health — check RestartCount and pg_stat_activity client_addr churn; confirm a
consumer's creds/reachability before restart:always.

Both generalize to federatedSocial (shared PG/Redis/Garage = blast radius).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 23:17:05 -04:00
Wayne Hayes
5d162884e8 Merge authelia-integration: vendor Authelia under authelia/ with single root .env; portal admin user from $AUTHELIA_ADMIN_USER 2026-06-12 23:10:57 -04:00
Wayne Hayes
783b09f463 authelia: set portal admin user to zarniwoop (match AUTHELIA_ADMIN_USER)
The vendored user db carried the template `admin`, but the operator .env sets
AUTHELIA_ADMIN_USER=zarniwoop, so portal login failed ("user not found"). Rename
the file-backend user to `zarniwoop` with an argon2id hash of the .env
AUTHELIA_ADMIN_PASSWORD (verified via `authelia crypto hash validate`). Email
kept as admin@infinidim.net (a real Stalwart mailbox) so password-reset works.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 22:26:26 -04:00
ea7eedcb7b LESSONS: SPF append-dup gotcha (#14) and ed25519 DKIM diagnosis (#15)
#14: Spaceship PUT keys records by name+type+VALUE, so changing an
existing RRSet's value APPENDS a second record (a double v=spf1 =
RFC 7208 permerror). Correct pattern: PUT new, DELETE old; DELETE body
is a bare JSON array, not {items:[...]}.

#15: ed25519 DKIM "fail" at Gmail alongside passing RSA is the known
Stalwart dual-signing issue, not a key problem -- proved the stored
seed derives the published p= exactly. Fix is RSA-only: removed the
ed25519 DNS key (done); disabling the ed25519 signature in Stalwart is
the remaining step. Also records the smarthost identity behind the SPF
fix. Corrected #13's "PUT won't disturb siblings" claim accordingly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 23:47:01 +01:00
5 changed files with 421 additions and 15 deletions

View File

@ -76,12 +76,34 @@ STALWART_S3_BUCKET=stalwart-mail
STALWART_SMARTHOST=
# ----------------------------------------------------------------------------
# TLS — Stalwart self-manages certs via ACME DNS-01 (works behind the L4 proxy)
# Provisioning — `./provision-stalwart.sh` configures Stalwart from this .env
# ----------------------------------------------------------------------------
# DNS provider + token for the DNS-01 challenge. Leave blank to instead mount
# a certbot-issued cert (see config/config.toml [certificate]).
STALWART_ACME_PROVIDER=
STALWART_ACME_TOKEN=
# Run it AFTER `docker compose up -d`. It writes stores, listeners, the primary
# domain (+DKIM), the admin + relay/catch-all accounts, TLS/DNS and (optional)
# SSO — all via the x: JMAP API. Idempotent; re-run any time.
#
# TIER 1 (default, trustless): leave the DNS keys below blank. The domain is
# created in MANUAL dns/dkim mode and the script PRINTS the records to publish.
# Certs: mount your wildcard, or front :80 at the edge for HTTP-01.
#
# TIER 2 (auto-DNS): set the DNS provider keys and Stalwart auto-publishes every
# record (MX/SPF/DKIM/DMARC/MTA-STS/SRV/CAA/TLS-RPT) and does ACME DNS-01.
# Provider tokens are fiddly and provider-specific (HE's API is flaky; Spaceship
# needs API access enabled on the key) — so this stays opt-in and user-managed.
STALWART_DNS_PROVIDER=spaceship # currently only 'spaceship' is wired
STALWART_DNS_API_KEY= # set => tier 2; blank => tier 1
STALWART_DNS_API_SECRET=
STALWART_DNS_DESC=managed # label for the x:DnsServer entry
# ACME contact email (enables Let's Encrypt DNS-01 in tier 2). Blank = skip ACME.
STALWART_ACME_CONTACT=
# --- SSO: let Authelia manage Stalwart login (optional) ---------------------
# true => provision-stalwart.sh creates an OIDC directory pointing at
# AUTHELIA_PORTAL_URL and prints the Authelia client block to paste. admin and
# the relay account KEEP password auth as break-glass, so SSO can't lock you out.
# (Login flow is UNVALIDATED on a throwaway here — test before trusting upstream.)
STALWART_SSO_ENABLE=false
STALWART_OIDC_CLIENT_SECRET= # shared secret for the Stalwart<->Authelia client
# ============================================================================

View File

@ -237,12 +237,121 @@ curl -s "https://spaceship.dev/api/v1/dns/records/<domain>?take=100&skip=0" \
-H "X-Api-Key: $KEY" -H "X-Api-Secret: $SECRET" | python3 -m json.tool
```
To add a record, `PUT` the same endpoint with a single-item `items` array — it
won't disturb siblings. **Snapshot the zone (GET) before any write** and diff
after; snapshots land in `_backup/` (gitignored). Always re-check at the
authoritative NS (`dig +short AAAA <name> @launch1.spaceship.net`), not a cache.
won't disturb siblings of a *different* name/type (but see #14 — for an existing
RRSet it **appends**, it does not replace). **Snapshot the zone (GET) before any
write** and diff after; snapshots land in `_backup/` (gitignored). Always
re-check at the authoritative NS (`dig +short AAAA <name> @launch1.spaceship.net`),
not a cache.
**Caveat — don't publish `mail` AAAA before the edge listens on v6.** Inbound
mail follows `MX → mail.<domain>`; an `AAAA` there with no v6 `:25` listener on
the edge makes senders try v6 and some won't fall back → deferred/bounced mail.
An **apex** `AAAA` is safe (it doesn't affect MX routing). Do `mail` AAAA + edge
v6 listeners together.
## 14. Spaceship `PUT` is an APPEND-by-value, not a replace — it can dupe an RRSet
**Symptom:** "Updating" the SPF record (`PUT` with `force:true` and the new
value) left the zone with **two** `v=spf1` apex TXT records. Two SPF records is
an RFC 7208 `permerror` → SPF **fails hard for everyone** — worse than the typo
you were fixing.
**Cause:** Spaceship keys records by (name, type, **value**). A `PUT` whose value
differs from the existing record is a *new* record, so `force:true` **adds**
rather than replacing. (The earlier AAAA/SPF adds looked like clean "upserts"
only because there was no prior record at that name+type, or the value matched.)
**Fix / correct pattern for an in-place value change:** `PUT` the new value, then
**`DELETE` the old one** — and the `DELETE` body is a **bare JSON array**, not
`{"items":[…]}` (the latter 422s with `Value is "object" but should be "array"`):
```bash
curl -s -X DELETE "https://spaceship.dev/api/v1/dns/records/<domain>" \
-H "X-Api-Key: $KEY" -H "X-Api-Secret: $SECRET" -H 'Content-Type: application/json' \
-d '[{"type":"TXT","name":"@","value":"v=spf1 mx -all"}]'
```
Always GET-diff before/after (count + REMOVED/ADDED sets) to catch a stray dupe.
## 15. ed25519 DKIM "fails" at Gmail with both ed25519+RSA — it's not your key
**Symptom:** DMARC aggregate reports show, per message, `dkim=pass` for the RSA
selector but `dkim=fail` for the ed25519 selector (`v1-ed25519-…`), on the *same*
intact message. Looks like a broken/mismatched ed25519 key.
**Cause:** **Not the key.** Verified cryptographically: the stored ed25519 seed
derives exactly the published `p=` (and the PKCS#8-v2 blob even embeds that same
pubkey). seed → pubkey → DNS all agree. It's the **known Stalwart dual-signing
issue** ([discussion #2727](https://github.com/stalwartlabs/stalwart/discussions/2727)):
when Stalwart applies *both* an ed25519 and an RSA signature, Gmail/Hotmail
mishandle the ed25519 one (`fail`, or `neutral (no key)`), while RSA passes. The
maintainer's own server runs with "ed25519 ignored, RSA passes." RSA carries
DMARC, so **mail is unaffected** — it's cosmetic, just noisy in reports.
How the key was proven (the seed lives in settings table `s`, PKCS#8 v2):
```bash
# 32-byte seed from the OCTET STRING in the stored PKCS#8; wrap as clean v0 DER:
printf '302e020100300506032b657004220420%s' "$SEED_HEX" | xxd -r -p > /tmp/ed.der
openssl pkey -inform DER -in /tmp/ed.der -pubout -outform DER | tail -c 32 | base64
# == the DNS p= value → key is correct
```
**Fix (proper = RSA-only):** the recommended cure is to stop emitting the ed25519
signature, not republish anything. Two parts:
1. **DNS (done 2026-06-12):** removed the `v1-ed25519-20260604._domainkey` TXT —
turns the report `fail` into a harmless "no key", DMARC still green via RSA.
2. **Stalwart (still TODO):** disable the ed25519 **signature** in the admin UI /
JMAP signing config so outbound stops carrying it (DB surgery on the serialized
signature object is risky — do it through the supported surface). The fallback
admin can't mint an API token non-interactively (only `authorization_code` /
`device_code` grants; no ROPC), so this needs the web UI or a device-code login.
**Aside discovered here:** outbound is a catch-all smarthost relay to
`mail.tail7b1641.ts.net` (auth `stalwart-relay@waynehayes.com`), which re-emits
as `mail.waynehayes.com` (`216.189.156.74` / `2602:ffc5:20::1:6b52`). That relay
IP is why SPF needed `include:waynehayes.com` (#14 / the SPF fix).
## 16. After a reboot, Stalwart started before Garage — admin site 404'd (NOT a boot loop)
**Symptom:** Post-reboot, the Stalwart web admin / app assets wouldn't load (404 /
blank), even though the container was `running` and **not** restart-looping.
**Cause:** the web UI (and other app assets) live in the **S3 blob store (Garage)**
Stalwart unpacks/serves them from S3. On reboot Stalwart came up *before* Garage was
ready, so the asset fetch failed. Stalwart itself was fine (PG connected, listeners up);
only the S3-backed content was missing. Easy to misread as "Stalwart is broken."
**Fix:** once Garage is up, restart Stalwart (or it picks them up on the next fetch).
Quick confirm it's a backend-readiness issue, not Stalwart: `running`+`healthy` but assets
404 → probe the backend from the sidecar (`nc -z garage.<tailnet> 3900`).
**Rule for the whole fleet (federatedSocial):** every app must gate on its backends being
**live, not merely present**. Model it on the Stalwart sidecar's healthcheck —
`depends_on: { <backend>: service_healthy }` plus a check that actually *probes* PG/Redis/
Garage over the tailnet (see #1, the PG-startup-race healthcheck). Don't assume shared
infra boots first; make it a startup-ordering/readiness convention across all sidecars.
## 17. A flapping shared-store consumer (atuin) looked like a Postgres problem
**Symptom:** "Postgres seems to be the cause / unstable." Actually `atuin-server` had
**RestartCount 6318, exit 1** — crash-looping for days and generating all the noise.
**Cause:** atuin couldn't reach/authenticate its DB and crash-looped under
`restart: unless-stopped`. **Postgres itself was healthy** (6 days up, 0 restarts,
17/100 conns). atuin never even established a connection — *no* atuin lines in the PG log
and *no* atuin rows in `pg_stat_activity` — i.e. it was dying **before** reaching PG.
**Diagnosis (fast):**
```bash
# which container is actually flapping (PG health != consumer health):
docker inspect <c> --format '{{.RestartCount}} exit={{.State.ExitCode}} oom={{.State.OOMKilled}}'
# is a consumer reconnect-storming the shared store? distinct/ghost client_addr = churn:
docker exec <pg> psql -U postgres -tAc \
"SELECT client_addr, state, count(*) FROM pg_stat_activity GROUP BY 1,2 ORDER BY 1"
```
Ephemeral sidecar nodes get a **new tailnet IP per restart**, so successive incarnations
leave **ghost idle connections** from dead IPs — a handy "how many times did it restart"
fingerprint (we saw this with Stalwart too: 1 live IP + 2 ghosts).
**Rule for the whole fleet:** a shared Postgres/Redis/Garage is a blast-radius surface —
one misconfigured consumer shouldn't be mistaken for a shared-infra outage. Confirm a
consumer's creds + backend reachability **before** enabling `restart: always/unless-stopped`,
and when something "looks like the DB," check the *consumers* first.

View File

@ -24,6 +24,7 @@ tailwart/
│ ├── caddy.json # :25/465/587/143/993 mail + :443 SNI fan-out → stalwart over the tailnet
│ ├── docker-compose.yml # deploy on any public-IP, tailnet, tag:reverse-proxy host
│ └── README.md
├── provision-stalwart.sh # one-shot: configure Stalwart entirely from .env (idempotent)
├── acl-snippet.hujson # tag:stalwart owner + grants to merge into your policy
├── .env.example # operator surface — copy to .env
└── .gitignore
@ -44,12 +45,20 @@ cp .env.example .env && $EDITOR .env # fill secrets (see CLAUDE.md prereq
# 2. admin console: assign tag:stalwart to the OAuth client + paste acl-snippet
# 3. bring up the mailbox (tailnet-only)
docker compose up -d
# 4. bring up the edge (binds public mail ports; can be a different host)
# 4. configure Stalwart entirely from .env — stores, listeners, domain (+DKIM),
# admin + relay/catch-all accounts, TLS/DNS, optional Authelia SSO. Idempotent.
./provision-stalwart.sh # add --print-dns to also dump records to publish
# 5. bring up the edge (binds public mail ports; can be a different host)
cd caddy && docker compose up -d --build
```
Then point `infinidim.net`'s MX at the edge host, add SPF/DKIM/DMARC, and finish
configuration in Stalwart's web admin (`mail.infinidim.net`).
`provision-stalwart.sh` replaces the setup wizard: it writes config straight into
Postgres via the `x:` JMAP API. **Tier 1** (default) configures everything and
*prints* the DNS records for you to publish; **tier 2** (set `STALWART_DNS_API_KEY`)
lets Stalwart auto-publish DNS + run ACME DNS-01. See the comments in `.env.example`.
Then point `infinidim.net`'s MX at the edge host (or let tier-2 publish it) and
finish any opinionated bits (spam tuning, retention) in the web admin.
## Status

View File

@ -1,10 +1,11 @@
# Authelia file user backend. Regenerate a hash with:
# docker run --rm authelia/authelia:latest authelia crypto hash generate argon2 --password 'PASS'
# docker run --rm authelia/authelia:4.39.20 authelia crypto hash generate argon2 --password 'PASS'
# Username/password come from AUTHELIA_ADMIN_USER / AUTHELIA_ADMIN_PASSWORD in the root .env.
users:
admin:
zarniwoop:
disabled: false
displayname: "Admin"
password: "$argon2id$v=19$m=65536,t=3,p=4$ZVJNUh4uH7VMccpo3aRihQ$b///aUhTewPsXZ2AcqqJKPb8nLq6xVNgLNJQ7/b5lmo"
displayname: "Zarniwoop"
password: "$argon2id$v=19$m=65536,t=3,p=4$2Zh5wh3yN/kCvR26mTRo9Q$g86OC3E8Q4lH0czTOal7Gci2+U6t0ZIFhogIwtRoA5M"
email: admin@infinidim.net
groups:
- admins

265
provision-stalwart.sh Normal file
View File

@ -0,0 +1,265 @@
#!/usr/bin/env bash
# provision-stalwart.sh — bring a fresh Stalwart up *fully configured from .env*.
#
# Stalwart v0.16 keeps ALL of its config in Postgres (not files), reachable
# through the `x:` JMAP management objects. config/config.json only tells the
# image where Postgres lives; this script writes everything else the setup
# wizard would — stores, listeners, the primary domain (+DKIM), the admin and
# relay/catch-all accounts, TLS/DNS, and (optionally) SSO via Authelia — so the
# operator never has to touch the wizard for a working mail server.
#
# Idempotent: every object is keyed by a stable name/singleton and created only
# if missing, so re-running is safe. Run AFTER `docker compose up -d` (the
# stalwart sidecar must be reachable on the tailnet).
#
# ./provision-stalwart.sh # provision from ./.env
# ./provision-stalwart.sh --print-dns # also dump the zone records to publish
#
# Designed to drop into federatedSocial's bootstrap.sh as `cmd_provision_stalwart`.
#
# TIERS (the DNS/TLS automation boundary):
# * Default (trustless): domain in MANUAL dkim/dns mode; certs via the existing
# wildcard / HTTP-01; the script PRINTS the exact DNS records to publish.
# * Opt-in: if STALWART_DNS_API_KEY is set AND the provider probe succeeds, the
# domain flips to AUTOMATIC — Stalwart auto-publishes DNS + does DNS-01.
#
# NOTE: the Authelia-SSO step (provision_oidc) writes the Stalwart side via API
# but only PRINTS the Authelia client block to paste — editing Authelia's
# hand-maintained YAML from a script is deliberately avoided. The admin and
# relay accounts ALWAYS keep password auth as break-glass, so enabling SSO can
# never lock you out of Stalwart. The SSO login flow could not be validated
# against a throwaway instance here — verify it on a test deploy before trusting.
set -euo pipefail
# ---------------------------------------------------------------------------
# Setup: load .env, derive endpoint + admin auth
# ---------------------------------------------------------------------------
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ENV_FILE="${REPO_ROOT}/.env"
[[ -f "$ENV_FILE" ]] || { echo "Error: ${ENV_FILE} not found (cp .env.example .env)" >&2; exit 1; }
set -a; source "$ENV_FILE"; set +a
PRINT_DNS=0; [[ "${1:-}" == "--print-dns" ]] && PRINT_DNS=1
for v in STALWART_MAGIC_NAME TS_TAILNET STALWART_DOMAIN STALWART_HOSTNAME \
STALWART_FALLBACK_ADMIN_SECRET REDIS_MAGIC_NAME STALWART_REDIS_DB \
GARAGE_MAGIC_NAME GARAGE_REGION GARAGE_ACCESS_KEY_ID GARAGE_SECRET_ACCESS_KEY \
STALWART_S3_BUCKET SMTP_USER SMTP_PASSWORD; do
[[ -n "${!v:-}" ]] || { echo "Error: required \$$v is empty in .env" >&2; exit 1; }
done
SW_HOST="${STALWART_MAGIC_NAME}.${TS_TAILNET}"
SW_BASE="http://${SW_HOST}:8080"
SW_JMAP="${SW_BASE}/jmap"
NET_TRUST='{"100.64.0.0/10":true,"fd7a:115c:a1e0::/48":true}' # tailnet CGNAT + ULA
RELAY_LOCAL="${SMTP_USER%@*}" # zaphod
RELAY_DOMAIN="${SMTP_USER#*@}" # infinidim.net
ACCT_ID="b" # principals primary account
log(){ printf ' %s\n' "$*"; }
die(){ echo "Error: $*" >&2; exit 1; }
# ---------------------------------------------------------------------------
# JMAP helpers (HTTP Basic — no PKCE/Bearer needed for scripting)
# ---------------------------------------------------------------------------
SW_AUTH="" # set by sw_auth()
# sw_call <methodCalls-json-array> -> raw response on stdout
sw_call(){
curl -s -m 25 -u "$SW_AUTH" -H 'Content-Type: application/json' "$SW_JMAP" -X POST \
--data "$(jq -nc --argjson mc "$1" '{using:["urn:ietf:params:jmap:core","urn:stalwart:jmap"],methodCalls:$mc}')"
}
# sw_ok <response> — die if any error/notCreated/notUpdated/notDestroyed present
sw_ok(){
local r="$1"
if echo "$r" | jq -e '.methodResponses[0][0]=="error"' >/dev/null 2>&1; then
die "JMAP error: $(echo "$r" | jq -c '.methodResponses[0][1]')"
fi
if echo "$r" | jq -e '.methodResponses[0][1] | (.notCreated//{}|length>0) or (.notUpdated//{}|length>0) or (.notDestroyed//[]|length>0)' >/dev/null 2>&1; then
die "JMAP set rejected: $(echo "$r" | jq -c '.methodResponses[0][1]|{notCreated,notUpdated,notDestroyed}')"
fi
}
# find id of object of TYPE whose `name` == NAME (empty if none)
sw_find_id(){ # $1 type $2 name
local r; r=$(sw_call "$(jq -nc --arg t "$1" '[[($t+"/query"),{accountId:"'"$ACCT_ID"'"},"0"],[($t+"/get"),{accountId:"'"$ACCT_ID"'","#ids":{resultOf:"0",name:($t+"/query"),path:"/ids"},properties:["name"]},"1"]]')")
echo "$r" | jq -r --arg n "$2" '.methodResponses[1][1].list[]? | select(.name==$n) | .id' | head -1
}
# authenticate: prefer a real admin account; fall back to the first-boot virtual admin
sw_auth(){
local secret="$STALWART_FALLBACK_ADMIN_SECRET" u
for u in "admin@${STALWART_DOMAIN}" "admin"; do
if curl -s -m 8 -u "${u}:${secret}" -o /dev/null -w '%{http_code}' "${SW_JMAP}/session" 2>/dev/null | grep -q 200; then
SW_AUTH="${u}:${secret}"; log "authenticated as ${u}"; return 0
fi
done
die "cannot authenticate to ${SW_JMAP} as admin (check STALWART_FALLBACK_ADMIN_SECRET / first-boot state)"
}
# ===========================================================================
# Provisioning steps
# ===========================================================================
ensure_admin(){ # persistent superuser (survives once config is written; fallback then goes inert)
[[ -n "${DOMAIN_ID:-}" ]] || die "ensure_admin needs the domain first"
local id; id=$(sw_find_id x:Account admin)
if [[ -z "$id" ]]; then
log "creating persistent admin account"
sw_ok "$(sw_call "$(jq -nc --arg pw "$STALWART_FALLBACK_ADMIN_SECRET" --arg dom "$DOMAIN_ID" \
'[["x:Account/set",{accountId:"'"$ACCT_ID"'",create:{a:{"@type":"User",name:"admin",domainId:$dom,description:"System administrator",roles:{"@type":"Admin"},credentials:{"0":{"@type":"Password",secret:$pw}}}}},"0"]]')")"
else log "admin account present"; fi
}
re_auth(){ # switch to the persistent admin so the rest of the run is immune to the
# first-boot fallback going inert the moment a real admin exists.
local u="admin@${STALWART_DOMAIN}"
if curl -s -m 8 -u "${u}:${STALWART_FALLBACK_ADMIN_SECRET}" -o /dev/null -w '%{http_code}' "${SW_JMAP}/session" 2>/dev/null | grep -q 200; then
SW_AUTH="${u}:${STALWART_FALLBACK_ADMIN_SECRET}"; log "re-authenticated as ${u}"
fi
}
ensure_stores(){
log "blob store -> Garage S3"
sw_ok "$(sw_call "$(jq -nc \
--arg ep "http://${GARAGE_MAGIC_NAME}.${TS_TAILNET}:3900" --arg region "$GARAGE_REGION" \
--arg bucket "$STALWART_S3_BUCKET" --arg ak "$GARAGE_ACCESS_KEY_ID" --arg sk "$GARAGE_SECRET_ACCESS_KEY" \
'[["x:BlobStore/set",{accountId:"'"$ACCT_ID"'",update:{singleton:{"@type":"S3",region:{"@type":"Custom",customEndpoint:$ep,customRegion:$region},bucket:$bucket,accessKey:$ak,secretKey:{"@type":"Value",secret:$sk},verifyAfterWrite:true}}},"0"]]')")"
log "in-memory store -> Redis db ${STALWART_REDIS_DB}"
sw_ok "$(sw_call "$(jq -nc --arg url "redis://${REDIS_MAGIC_NAME}.${TS_TAILNET}:6379/${STALWART_REDIS_DB}" \
'[["x:InMemoryStore/set",{accountId:"'"$ACCT_ID"'",update:{singleton:{"@type":"Redis",url:$url}}},"0"]]')")"
}
ensure_listeners(){
# name port protocol implicitTls proxyTrust(0/1)
local rows=(
"smtp 25 smtp 0 1" "submission 587 smtp 0 1" "submissions 465 smtp 1 1"
"imap 143 imap 0 1" "imaps 993 imap 1 1"
"http 8080 http 0 0" "https 443 http 1 0"
"sieve 4190 manageSieve 0 0"
)
local row name port proto impl trust id
for row in "${rows[@]}"; do
read -r name port proto impl trust <<<"$row"
id=$(sw_find_id x:NetworkListener "$name"); [[ -n "$id" ]] && { log "listener ${name} present"; continue; }
log "creating listener ${name} (:${port})"
local pt='{}'; [[ "$trust" == 1 ]] && pt="$NET_TRUST"
sw_ok "$(sw_call "$(jq -nc --arg n "$name" --arg bind "[::]:${port}" --arg p "$proto" \
--argjson impl "$([[ $impl == 1 ]] && echo true || echo false)" --argjson pt "$pt" \
'[["x:NetworkListener/set",{accountId:"'"$ACCT_ID"'",create:{l:{name:$n,bind:{($bind):true},protocol:$p,useTls:true,tlsImplicit:$impl,overrideProxyTrustedNetworks:$pt}}},"0"]]')")"
done
}
ensure_domain(){ # primary mail domain; DKIM auto-gen; dns/cert mode set later by tier
DOMAIN_ID=$(sw_find_id x:Domain "$STALWART_DOMAIN")
if [[ -z "$DOMAIN_ID" ]]; then
log "creating domain ${STALWART_DOMAIN} (auto DKIM)"
local r; r=$(sw_call "$(jq -nc --arg d "$STALWART_DOMAIN" --arg ca "$SMTP_USER" \
'[["x:Domain/set",{accountId:"'"$ACCT_ID"'",create:{d:{name:$d,isEnabled:true,description:"Primary mail domain",catchAllAddress:$ca,subAddressing:{"@type":"Enabled"},dkimManagement:{"@type":"Automatic",algorithms:{Dkim1Ed25519Sha256:true,Dkim1RsaSha256:true},selectorTemplate:"v{version}-{algorithm}-{date-%Y%m%d}",rotateAfter:7776000000,retireAfter:604800000,deleteAfter:2592000000}}}},"0"]]')")
sw_ok "$r"; DOMAIN_ID=$(echo "$r" | jq -r '.methodResponses[0][1].created.d.id')
else log "domain ${STALWART_DOMAIN} present"; fi
}
ensure_accounts(){ # relay / catch-all account from SMTP_USER (admin handled separately)
local id; id=$(sw_find_id x:Account "$RELAY_LOCAL")
if [[ -z "$id" ]]; then
log "creating relay account ${SMTP_USER}"
sw_ok "$(sw_call "$(jq -nc --arg n "$RELAY_LOCAL" --arg dom "$DOMAIN_ID" --arg pw "$SMTP_PASSWORD" \
'[["x:Account/set",{accountId:"'"$ACCT_ID"'",create:{a:{"@type":"User",name:$n,domainId:$dom,description:"Relay / catch-all",credentials:{"0":{"@type":"Password",secret:$pw}}}}},"0"]]')")"
else log "relay account ${SMTP_USER} present"; fi
}
ensure_system(){
log "system settings: hostname ${STALWART_HOSTNAME}"
sw_ok "$(sw_call "$(jq -nc --arg h "$STALWART_HOSTNAME" --arg d "$DOMAIN_ID" \
'[["x:SystemSettings/set",{accountId:"'"$ACCT_ID"'",update:{singleton:{defaultHostname:$h,defaultDomainId:$d}}},"0"]]')")"
}
# ---- TLS / DNS tier -------------------------------------------------------
ensure_dns_tier(){
if [[ -n "${STALWART_DNS_API_KEY:-}" && "${STALWART_DNS_PROVIDER:-spaceship}" == "spaceship" ]]; then
log "DNS provider key present -> tier 2 (automatic publish via Spaceship)"
local sid; sid=$(sw_find_id x:DnsServer "${STALWART_DNS_DESC:-managed}")
if [[ -z "$sid" ]]; then
local r; r=$(sw_call "$(jq -nc --arg k "$STALWART_DNS_API_KEY" --arg s "${STALWART_DNS_API_SECRET:-}" --arg n "${STALWART_DNS_DESC:-managed}" \
'[["x:DnsServer/set",{accountId:"'"$ACCT_ID"'",create:{s:{"@type":"Spaceship",description:$n,apiKey:$k,secret:$s,ttl:300000,pollingInterval:15000,propagationTimeout:60000}}},"0"]]')")
sw_ok "$r"; sid=$(echo "$r" | jq -r '.methodResponses[0][1].created.s.id')
fi
log "domain -> Automatic DNS (origin ${STALWART_DOMAIN}) + ACME DNS-01"
sw_ok "$(sw_call "$(jq -nc --arg sid "$sid" --arg origin "$STALWART_DOMAIN" \
'[["x:Domain/set",{accountId:"'"$ACCT_ID"'",update:{"'"$DOMAIN_ID"'":{dnsManagement:{"@type":"Automatic",dnsServerId:$sid,origin:$origin,publishRecords:{autoConfig:true,autoConfigLegacy:true,autoDiscover:true,caa:true,dkim:true,dmarc:true,mtaSts:true,mx:true,spf:true,srv:true,tlsRpt:true}}}}},"0"]]')")"
ensure_acme_dns01
PRINT_DNS=0 # records publish themselves
else
log "no DNS API key -> tier 1 (trustless): manual DNS, records printed below"
PRINT_DNS=1
fi
}
ensure_acme_dns01(){
[[ -z "${STALWART_ACME_CONTACT:-}" ]] && { log " (set STALWART_ACME_CONTACT to enable DNS-01 ACME)"; return; }
local id; id=$(sw_find_id x:AcmeProvider letsencrypt)
[[ -n "$id" ]] && { log " ACME provider present"; return; }
log " creating ACME provider (Let's Encrypt, DNS-01)"
sw_ok "$(sw_call "$(jq -nc --arg c "$STALWART_ACME_CONTACT" \
'[["x:AcmeProvider/set",{accountId:"'"$ACCT_ID"'",create:{p:{name:"letsencrypt",directory:"https://acme-v02.api.letsencrypt.org/directory",challengeType:"Dns01",contact:$c,renewBefore:"R23",maxRetries:10}}},"0"]]')")"
}
print_dns_records(){
[[ "$PRINT_DNS" == 1 ]] || return 0
echo; echo "=== Publish these DNS records for ${STALWART_DOMAIN} (tier 1 / manual) ==="
sw_call "$(jq -nc --arg id "$DOMAIN_ID" '[["x:Domain/get",{accountId:"'"$ACCT_ID"'",ids:[$id],properties:["dnsZoneFile"]},"0"]]')" \
| jq -r '.methodResponses[0][1].list[0].dnsZoneFile // "(zone file unavailable)"'
echo "=== then re-run with STALWART_DNS_API_KEY set for automatic publishing ==="
}
# ---- Authelia SSO (opt-in; admin/relay keep password auth as break-glass) --
provision_oidc(){
[[ "${STALWART_SSO_ENABLE:-false}" == "true" ]] || { log "SSO disabled (set STALWART_SSO_ENABLE=true to wire Authelia)"; return; }
[[ -n "${AUTHELIA_PORTAL_URL:-}" ]] || die "STALWART_SSO_ENABLE=true but AUTHELIA_PORTAL_URL is empty"
local issuer="$AUTHELIA_PORTAL_URL" id
id=$(sw_find_id x:Directory "authelia")
if [[ -z "$id" ]]; then
log "creating Stalwart OIDC directory -> ${issuer}"
sw_ok "$(sw_call "$(jq -nc --arg iss "$issuer" --arg ud "$STALWART_DOMAIN" \
'[["x:Directory/set",{accountId:"'"$ACCT_ID"'",create:{d:{"@type":"Oidc",description:"authelia",issuerUrl:$iss,claimUsername:"preferred_username",claimName:"name",claimGroups:"groups",usernameDomain:$ud}}},"0"]]')")"
else log "OIDC directory present"; fi
cat <<EOF
>>> Add this client to authelia/configuration.yml under
>>> identity_providers.oidc.clients (hash the secret with:
>>> docker run --rm authelia/authelia authelia crypto hash generate pbkdf2 --password '<secret>')
- client_id: stalwart
client_name: Stalwart Mail
client_secret: '<PBKDF2-HASH-OF-${STALWART_OIDC_CLIENT_SECRET:-<generate one>}>'
public: false
authorization_policy: two_factor
redirect_uris:
- https://${STALWART_HOSTNAME}/auth/oauth
scopes: [openid, profile, email, groups]
>>> admin + ${SMTP_USER} keep password auth as break-glass — SSO can't lock you out.
EOF
}
# ===========================================================================
main(){
echo "Provisioning Stalwart at ${SW_BASE} from ${ENV_FILE}"
command -v jq >/dev/null || die "jq is required"
curl -s -m 8 -o /dev/null "${SW_BASE}/jmap/session" || die "cannot reach ${SW_BASE} (is the stack up?)"
sw_auth
ensure_stores
ensure_domain
ensure_admin
re_auth
ensure_accounts
ensure_listeners
ensure_system
ensure_dns_tier
provision_oidc
print_dns_records
echo "Done. Listeners rebind on container restart if newly created: docker compose restart stalwart"
}
# Run only when executed directly — so the file can be sourced for testing.
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then main "$@"; fi