Commit Graph

18 Commits

Author SHA1 Message Date
Wayne Hayes
1e5fc982eb LESSONS: shared-infra readiness (#16 boot-order) + flapping consumer (#17 atuin)
#16 Stalwart-before-Garage on reboot → S3-backed admin SPA 404'd (not a boot
loop). Gate every app on backend *liveness* (depends_on service_healthy +
probe PG/Redis/Garage over the tailnet), don't assume shared infra boots first.

#17 atuin crash-looped 6318x (exit 1) and looked like a Postgres problem;
Postgres was healthy and atuin never even connected. PG health != consumer
health — check RestartCount and pg_stat_activity client_addr churn; confirm a
consumer's creds/reachability before restart:always.

Both generalize to federatedSocial (shared PG/Redis/Garage = blast radius).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 23:17:05 -04:00
Wayne Hayes
5d162884e8 Merge authelia-integration: vendor Authelia under authelia/ with single root .env; portal admin user from $AUTHELIA_ADMIN_USER 2026-06-12 23:10:57 -04:00
Wayne Hayes
783b09f463 authelia: set portal admin user to zarniwoop (match AUTHELIA_ADMIN_USER)
The vendored user db carried the template `admin`, but the operator .env sets
AUTHELIA_ADMIN_USER=zarniwoop, so portal login failed ("user not found"). Rename
the file-backend user to `zarniwoop` with an argon2id hash of the .env
AUTHELIA_ADMIN_PASSWORD (verified via `authelia crypto hash validate`). Email
kept as admin@infinidim.net (a real Stalwart mailbox) so password-reset works.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 22:26:26 -04:00
7eefac0224 authelia: sync caddy-forward-auth snippet to deployed reality
The portal vhost + forward-auth are now live on the main box Caddy. Align
the template with what was actually deployed:

- upstream host -> agrajag.tail7b1641.ts.net (the Authelia node's MagicDNS
  name), replacing the majikthise placeholder
- drop the explicit `tls` cert-file lines: this Caddy uses automatic HTTPS
  (no /etc/caddy/certs); ACME for auth.infinidim.net rides the :443->:8443
  SNI fan-out (tls-alpn-01) + :80 (http-01)
- forward-auth endpoint /api/verify?rd=... -> /api/authz/forward-auth, the
  Authelia 4.39 path; portal redirect comes from authelia_url in the yml
- note the infinidim.net CAA accounturi pin: a new L7 vhost 403s until this
  Caddy's LE account is allowlisted (now done alongside Stalwart's)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 02:24:59 +01:00
ea7eedcb7b LESSONS: SPF append-dup gotcha (#14) and ed25519 DKIM diagnosis (#15)
#14: Spaceship PUT keys records by name+type+VALUE, so changing an
existing RRSet's value APPENDS a second record (a double v=spf1 =
RFC 7208 permerror). Correct pattern: PUT new, DELETE old; DELETE body
is a bare JSON array, not {items:[...]}.

#15: ed25519 DKIM "fail" at Gmail alongside passing RSA is the known
Stalwart dual-signing issue, not a key problem -- proved the stored
seed derives the published p= exactly. Fix is RSA-only: removed the
ed25519 DNS key (done); disabling the ed25519 signature in Stalwart is
the remaining step. Also records the smarthost identity behind the SPF
fix. Corrected #13's "PUT won't disturb siblings" claim accordingly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 23:47:01 +01:00
Wayne Hayes
ddf00fbf90 authelia: vendor into the tree under authelia/ with a single root .env
Move the Authelia stack (compose, config, snippets, docs) out of the separate
/opt/authelia repo into authelia/, so the whole deployment shares ONE operator
.env at the repo root. The four shared infra vars (TS_OAUTH_CLIENT_SECRET,
TS_TAILNET, DB_MAGIC_NAME, REDIS_MAGIC_NAME) are defined once; authelia/.env is
a symlink to ../.env (gitignored, recreated per host). .env.example + .gitignore
folded in.

Run from the repo root:  docker compose -f authelia/docker-compose.yml up -d
(or: cd authelia && docker compose up -d  — the .env symlink makes it resolve).

The standalone /opt/authelia is left intact as a history archive; remove once
this is verified.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 21:30:18 -04:00
Wayne Hayes
cd1cdbd110 Merge mail-edge-hardening: live v0.16 docs, IPv6 egress, outbound-relay lessons, CLAUDE IPv6 fix 2026-06-11 20:45:53 -04:00
Wayne Hayes
d292fb0307 docs(CLAUDE): drop stale 'container has no IPv6' claim; align with LESSONS 8-9
The sidecar gained real IPv6 egress (commit 34422ba / LESSONS.md 9), but the
outbound pitfall still asserted 'no IPv6 / no AAAA->A fallback'. Reword to
reflect the fix while keeping the tailnet-relay guidance.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:45:53 -04:00
f8aa6c39c7 README: catch up to live v0.16 state
config.toml -> config.json (v0.16 datastore-only model; toml is dead
historical reference); note everything else lives in Postgres. Add the
:443 SNI fan-out to the edge layout and the IPv6-egress note to the
mailbox. Link LESSONS.md. Rewrite Status from "scaffold/strawman" to
live (pinned v0.16.7, ACME wildcard, tailnet relay, container IPv6),
with the "no inbound v6 until edge v6 listeners" caveat.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 00:01:49 +01:00
34422ba2b1 mailbox: give sidecar netns real IPv6 egress; resolve AAAA trap; DNS notes
Add enable_ipv6 + a ULA subnet to tailwart_default so the Stalwart
container (sharing the ts-stalwart netns) gets working IPv6 egress.
Because only egress is needed (inbound arrives via the edge/tailnet),
a ULA + Docker masquerade suffices -- no routable prefix, ndppd, or
host sysctl changes (Docker 29 enables ip6tables by default; host
forwarding was already on). Verified: ping6 + TCP/443 to v6 literals
from inside the netns; zero ENETUNREACH since boot.

LESSONS: mark #8/#9 resolved with the ULA-masquerade recipe, and add
#13 -- Spaceship's DNS API is RRSet-upsert (not zone-replace), so
Stalwart/ACME did not eat custom AAAA records; a vanished AAAA is a
provider-side loss, not Stalwart. Includes the safe read/verify flow
and the "don't publish mail AAAA before edge v6 listeners" caveat.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 23:53:28 +01:00
3a9819c3ee docs: capture outbound-relay lessons (IPv6/AAAA trap, SMTP port block, sidecar ACL)
LESSONS.md gains 8-12: container has no IPv6 (AAAA fails before A, no
fallback), host IPv6 != container IPv6, VPS blocks all outbound SMTP
ports (relay over tailnet), sidecar needs a source ACL grant to
initiate, and MtaRoute changes only take effect on restart.

CLAUDE.md and .env.example warn that the smarthost address must be an
IPv4 literal or tailnet IP, never a dual-stack hostname. acl-snippet
adds the tag:stalwart -> tag:mail outbound grant.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 22:43:21 +01:00
45e06ed524 Merge pull request 'Harden mail edge: PG-race healthcheck gate, :443 SNI fan-out, docs' (#1) from mail-edge-hardening into main
Reviewed-on: #1
2026-06-11 00:47:29 -04:00
38ba2eb83d Harden mail edge: PG-race healthcheck gate, :443 SNI fan-out, docs
Fixes the root cause that was silently dropping Stalwart's cert/setting
writes, completes the public HTTPS endpoints, and captures the debugging
knowledge.

- docker-compose.yml: gate the ts-stalwart healthcheck on Postgres
  reachability (nc -z the-record-prod:5432) in addition to tailscaled
  health. Stalwart's depends_on: service_healthy can no longer release it
  into the window where the tailnet route to Postgres isn't up yet — which
  was failing table init and losing in-flight cert writes (-> rcgen).

- caddy/caddy.json + README: add the :443 SNI fan-out. mta-sts /
  autoconfig / autodiscover pass through to stalwart:443 (Stalwart
  terminates TLS with its wildcard cert; no proxy_protocol on :443).
  All other SNIs go to the box's web Caddy on :8443 (https_port 8443).
  L7 reverse_proxy is impossible here: CAA pins issuance to Stalwart's
  ACME account, so Caddy can't obtain its own cert for these names.

- acl-snippet.hujson: grant tcp:443 on reverse-proxy -> stalwart for the
  SNI pass-through.

- config/config.json: track the v0.16 bootstrap (commit-safe; the DB
  secret is an EnvironmentVariable reference, not inline).

- LESSONS.md: symptom -> cause -> fix notes (PG race, DNS-01/Spaceship
  dead key, auto-ban vs PROXY protocol, wildcard-requires-DNS-01, SNI
  pass-through, ephemeral sidecar IP, LE rate-limit checks).

- .gitignore: exclude _backup/ and _validate/ (DB dumps + an inline-secret
  config) and editor swap files. NEVER commit those.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 05:15:34 +01:00
Wayne Hayes
e9febd037c stalwart: migrate to v0.16 config model; fix stores, listeners, persistence
v0.16 dropped TOML/%{env}% for a JSON datastore-only config, with all other
settings living in Postgres. This migrates the deployment and fixes the
fallout found during the first real run.

- config/config.json: v0.16 JSON bootstrap (root = PostgreSql datastore;
  DB password via the EnvironmentVariable secret type, so it stays
  commit-safe). Replaces the now-dead config.toml.
- docker-compose.yml: bind-mount config.json -> /etc/stalwart/config.json
  (the image's --config path) and use a named volume for /var/lib/stalwart;
  the old anonymous volumes were orphaned on every recreate ("lost settings").
  Drop the dead config.toml mount.
- .gitignore: exclude local operational artifacts that hold real secrets +
  mail data (_backup/, _validate/, *.dump, export/). config/config.json is
  intentionally tracked (secret-free).
- CLAUDE.md: "Lessons learned — v0.16 first real run" — config model, the
  anonymous-volume trap, full-FQDN store endpoints, per-listener PROXY trust,
  one-instance-per-store, recovery mode + argon2 password reset, ACME, backups.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 23:36:46 -04:00
1072fea410 Pinned 0.16.7 2026-06-04 01:21:58 -04:00
Wayne Hayes
24b3b2a11b corrected image url 2026-06-03 23:28:08 -04:00
Wayne Hayes
a9e2a736fc caddy: build via caddyserver.com download URL, not local xcaddy
The xcaddy/Go compile burns ~1GB RAM this VPS can't spare (per ~/docs/caddy.md
"Custom Binary"). Pull the prebuilt L4-enabled binary from the Caddy build
server instead and swap it over the stock binary in the official image. Built
and verified: caddy v2.11.3 with layer4.handlers.proxy + proxy_protocol.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 22:39:33 -04:00
Wayne Hayes
2eb8a0c225 Scaffold tailwart: Stalwart mailbox as a Tailscale sidecar
Over-engineered play stack for infinidim.net — Stalwart wired into the shared
Postgres + Redis + Garage S3 over the tailnet, with no WAN presence. Public
mail ports are fronted by a separate caddy-l4 layer-4 proxy (caddy/) that can
run on any tailnet host tagged tag:reverse-proxy — decoupled from the mailbox.

- docker-compose.yml: ts-stalwart sidecar + stalwart, backends via MagicDNS
- config/config.toml: PG (data/fts) + Redis (lookup) + S3 (blob) strawman
- caddy/: xcaddy build with caddy-l4, JSON layer-4 mail proxy, own compose
- acl-snippet.hujson: tag:stalwart owner + backend/edge grants
- .env.example + gitignored .env (pulled from shared infra)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 22:25:38 -04:00