#16 Stalwart-before-Garage on reboot → S3-backed admin SPA 404'd (not a boot
loop). Gate every app on backend *liveness* (depends_on service_healthy +
probe PG/Redis/Garage over the tailnet), don't assume shared infra boots first.
#17 atuin crash-looped 6318x (exit 1) and looked like a Postgres problem;
Postgres was healthy and atuin never even connected. PG health != consumer
health — check RestartCount and pg_stat_activity client_addr churn; confirm a
consumer's creds/reachability before restart:always.
Both generalize to federatedSocial (shared PG/Redis/Garage = blast radius).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The vendored user db carried the template `admin`, but the operator .env sets
AUTHELIA_ADMIN_USER=zarniwoop, so portal login failed ("user not found"). Rename
the file-backend user to `zarniwoop` with an argon2id hash of the .env
AUTHELIA_ADMIN_PASSWORD (verified via `authelia crypto hash validate`). Email
kept as admin@infinidim.net (a real Stalwart mailbox) so password-reset works.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The portal vhost + forward-auth are now live on the main box Caddy. Align
the template with what was actually deployed:
- upstream host -> agrajag.tail7b1641.ts.net (the Authelia node's MagicDNS
name), replacing the majikthise placeholder
- drop the explicit `tls` cert-file lines: this Caddy uses automatic HTTPS
(no /etc/caddy/certs); ACME for auth.infinidim.net rides the :443->:8443
SNI fan-out (tls-alpn-01) + :80 (http-01)
- forward-auth endpoint /api/verify?rd=... -> /api/authz/forward-auth, the
Authelia 4.39 path; portal redirect comes from authelia_url in the yml
- note the infinidim.net CAA accounturi pin: a new L7 vhost 403s until this
Caddy's LE account is allowlisted (now done alongside Stalwart's)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
#14: Spaceship PUT keys records by name+type+VALUE, so changing an
existing RRSet's value APPENDS a second record (a double v=spf1 =
RFC 7208 permerror). Correct pattern: PUT new, DELETE old; DELETE body
is a bare JSON array, not {items:[...]}.
#15: ed25519 DKIM "fail" at Gmail alongside passing RSA is the known
Stalwart dual-signing issue, not a key problem -- proved the stored
seed derives the published p= exactly. Fix is RSA-only: removed the
ed25519 DNS key (done); disabling the ed25519 signature in Stalwart is
the remaining step. Also records the smarthost identity behind the SPF
fix. Corrected #13's "PUT won't disturb siblings" claim accordingly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the Authelia stack (compose, config, snippets, docs) out of the separate
/opt/authelia repo into authelia/, so the whole deployment shares ONE operator
.env at the repo root. The four shared infra vars (TS_OAUTH_CLIENT_SECRET,
TS_TAILNET, DB_MAGIC_NAME, REDIS_MAGIC_NAME) are defined once; authelia/.env is
a symlink to ../.env (gitignored, recreated per host). .env.example + .gitignore
folded in.
Run from the repo root: docker compose -f authelia/docker-compose.yml up -d
(or: cd authelia && docker compose up -d — the .env symlink makes it resolve).
The standalone /opt/authelia is left intact as a history archive; remove once
this is verified.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The sidecar gained real IPv6 egress (commit 34422ba / LESSONS.md 9), but the
outbound pitfall still asserted 'no IPv6 / no AAAA->A fallback'. Reword to
reflect the fix while keeping the tailnet-relay guidance.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
config.toml -> config.json (v0.16 datastore-only model; toml is dead
historical reference); note everything else lives in Postgres. Add the
:443 SNI fan-out to the edge layout and the IPv6-egress note to the
mailbox. Link LESSONS.md. Rewrite Status from "scaffold/strawman" to
live (pinned v0.16.7, ACME wildcard, tailnet relay, container IPv6),
with the "no inbound v6 until edge v6 listeners" caveat.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add enable_ipv6 + a ULA subnet to tailwart_default so the Stalwart
container (sharing the ts-stalwart netns) gets working IPv6 egress.
Because only egress is needed (inbound arrives via the edge/tailnet),
a ULA + Docker masquerade suffices -- no routable prefix, ndppd, or
host sysctl changes (Docker 29 enables ip6tables by default; host
forwarding was already on). Verified: ping6 + TCP/443 to v6 literals
from inside the netns; zero ENETUNREACH since boot.
LESSONS: mark #8/#9 resolved with the ULA-masquerade recipe, and add
#13 -- Spaceship's DNS API is RRSet-upsert (not zone-replace), so
Stalwart/ACME did not eat custom AAAA records; a vanished AAAA is a
provider-side loss, not Stalwart. Includes the safe read/verify flow
and the "don't publish mail AAAA before edge v6 listeners" caveat.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
LESSONS.md gains 8-12: container has no IPv6 (AAAA fails before A, no
fallback), host IPv6 != container IPv6, VPS blocks all outbound SMTP
ports (relay over tailnet), sidecar needs a source ACL grant to
initiate, and MtaRoute changes only take effect on restart.
CLAUDE.md and .env.example warn that the smarthost address must be an
IPv4 literal or tailnet IP, never a dual-stack hostname. acl-snippet
adds the tag:stalwart -> tag:mail outbound grant.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes the root cause that was silently dropping Stalwart's cert/setting
writes, completes the public HTTPS endpoints, and captures the debugging
knowledge.
- docker-compose.yml: gate the ts-stalwart healthcheck on Postgres
reachability (nc -z the-record-prod:5432) in addition to tailscaled
health. Stalwart's depends_on: service_healthy can no longer release it
into the window where the tailnet route to Postgres isn't up yet — which
was failing table init and losing in-flight cert writes (-> rcgen).
- caddy/caddy.json + README: add the :443 SNI fan-out. mta-sts /
autoconfig / autodiscover pass through to stalwart:443 (Stalwart
terminates TLS with its wildcard cert; no proxy_protocol on :443).
All other SNIs go to the box's web Caddy on :8443 (https_port 8443).
L7 reverse_proxy is impossible here: CAA pins issuance to Stalwart's
ACME account, so Caddy can't obtain its own cert for these names.
- acl-snippet.hujson: grant tcp:443 on reverse-proxy -> stalwart for the
SNI pass-through.
- config/config.json: track the v0.16 bootstrap (commit-safe; the DB
secret is an EnvironmentVariable reference, not inline).
- LESSONS.md: symptom -> cause -> fix notes (PG race, DNS-01/Spaceship
dead key, auto-ban vs PROXY protocol, wildcard-requires-DNS-01, SNI
pass-through, ephemeral sidecar IP, LE rate-limit checks).
- .gitignore: exclude _backup/ and _validate/ (DB dumps + an inline-secret
config) and editor swap files. NEVER commit those.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
v0.16 dropped TOML/%{env}% for a JSON datastore-only config, with all other
settings living in Postgres. This migrates the deployment and fixes the
fallout found during the first real run.
- config/config.json: v0.16 JSON bootstrap (root = PostgreSql datastore;
DB password via the EnvironmentVariable secret type, so it stays
commit-safe). Replaces the now-dead config.toml.
- docker-compose.yml: bind-mount config.json -> /etc/stalwart/config.json
(the image's --config path) and use a named volume for /var/lib/stalwart;
the old anonymous volumes were orphaned on every recreate ("lost settings").
Drop the dead config.toml mount.
- .gitignore: exclude local operational artifacts that hold real secrets +
mail data (_backup/, _validate/, *.dump, export/). config/config.json is
intentionally tracked (secret-free).
- CLAUDE.md: "Lessons learned — v0.16 first real run" — config model, the
anonymous-volume trap, full-FQDN store endpoints, per-listener PROXY trust,
one-instance-per-store, recovery mode + argon2 password reset, ACME, backups.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The xcaddy/Go compile burns ~1GB RAM this VPS can't spare (per ~/docs/caddy.md
"Custom Binary"). Pull the prebuilt L4-enabled binary from the Caddy build
server instead and swap it over the stock binary in the official image. Built
and verified: caddy v2.11.3 with layer4.handlers.proxy + proxy_protocol.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Over-engineered play stack for infinidim.net — Stalwart wired into the shared
Postgres + Redis + Garage S3 over the tailnet, with no WAN presence. Public
mail ports are fronted by a separate caddy-l4 layer-4 proxy (caddy/) that can
run on any tailnet host tagged tag:reverse-proxy — decoupled from the mailbox.
- docker-compose.yml: ts-stalwart sidecar + stalwart, backends via MagicDNS
- config/config.toml: PG (data/fts) + Redis (lookup) + S3 (blob) strawman
- caddy/: xcaddy build with caddy-l4, JSON layer-4 mail proxy, own compose
- acl-snippet.hujson: tag:stalwart owner + backend/edge grants
- .env.example + gitignored .env (pulled from shared infra)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>