ADR-052: Cloudflare edge — DNS, TLS, proxied hostnames, and WAF posture
Context
runspectral.com is on Cloudflare, which is authoritative for DNS. Origins are Cloudflare-native: the API/workers runtime is a Worker-fronted Container (ADR-109), while the customer dashboard, staff Operations app, and docs surfaces are Cloudflare Pages projects. This ADR settles the edge posture: which hostnames are proxied, TLS, the edge security rules, JWKS validation at Pages Functions, and cookie discipline.
Decision
D1 — Cloudflare is the edge substrate
DNS zone management, TLS termination on proxied hostnames, Pages hosting (app, ops, docs, codex), and the edge security layer all run on Cloudflare. Origins are Cloudflare-native (Worker-fronted Container + Pages) — there is no external origin and no origin certificate in the chain. The per-hostname record targets live in the declarative plan at infra/cloudflare/zone-records.md.
D2 — All product hostnames are Cloudflare-proxied (orange)
api. (Worker-fronted Container) and app., ops., docs., codex. (Pages) are all proxied. Each is attached as a custom domain on its fronting Worker / Pages project, which issues and renews the edge certificate automatically. api. proxying is load-bearing: it is the product’s primary, revenue-bearing, high-volume programmatic surface (/api/decide + the MCP surface), and the edge is where its L7 DDoS protection, WAF, and — most importantly — per-key rate limiting live. Staging hostnames mirror (*-staging.) once a staging environment exists.
D3 — Edge security posture
- Cloudflare Managed Ruleset: ON.
- OWASP Core Rule Set: off (or Log mode) until
/api/decide+ MCP traffic patterns are observed — CRS is false-positive-prone on bearer-token JSON POSTs, and the edge’s value (rate limiting, L7 DDoS, Managed Ruleset) does not depend on it. Move toward Block only after path-scoped tuning. - Rate limiting: the primary control on
api.*, keyed on JWT / API-key identity, not IP (programmatic and MCP callers share egress addresses). It sits ahead of origin authentication, so an abusive or runaway-agent caller is shed before it forces origin compute; app-layer limits (ADR-035 D5, ADR-084) remain the secondary layer. A rate-limit rule also guards theapp.*/auth/*+ops.*/auth/*callback paths. - Bot Fight Mode: OFF. Free-plan BFM blocks datacenter-origin requests indiscriminately — which blocks legitimate programmatic API callers and CI — and cannot be scoped with a WAF Skip rule. Bot protection on the human-facing portals, if wanted, is a scoped WAF rule; never blanket BFM on
api.. - Edge JWT validation: off — JWTs are validated in FastAPI middleware and the codex Pages Function, not duplicated at the edge.
D4 — Pages Function JWKS gates
The ops and codex Pages Functions validate Supabase-issued JWTs (Pattern A, JWKS-local per ADR-046) and check app_metadata.organization_role == "operations" before serving staff-only content. codex caches JWKS in Cloudflare KV with a 10-minute TTL; a kid miss bypasses the cache and refetches (Supabase rotates by kid, not by time, so a pure-TTL cache would return “unknown kid” until expiry). The verifier is hand-rolled / Hono jwk or jose — not the Cloudflare Access plugin, which validates CF-issued tokens, not Supabase-issued. A contract test asserts parity between the Pages Function’s JWKS validation and the FastAPI middleware’s on the same JWT inputs.
D5 — Cache + cookie discipline
- No zone-level Cache Rules at alpha; Pages projects purge on deploy. Manual
purge_everythingonly on an observed stale-asset report, scoped to the zone. - Auth-sensitive SSR responses and auth-refresh responses set
Cache-Control: private, no-store— a contract test enforces it (a cacheable auth-refresh is a cross-user session-poisoning vector). - Cookies scope to the
runspectral.comeTLD+1. PKCE session cookies split across multiplesb-*cookies when a token exceeds 4096 bytes (large OAuth claims);@supabase/supabase-jshandles the split client-side and the Pages Function reassembles.
Alternatives considered
- Leave
api.DNS-only (un-proxied). Rejected: it forfeits edge rate limiting and L7 DDoS on the product’s highest-volume programmatic surface. The “small attack surface because everything is authenticated” argument is weak — auth runs at the origin, after a flood has already consumed connection slots and compute; the consequential L7 vector (a compromised key or a runaway agent loop sending valid, authenticated requests) is addressed by per-key edge rate limiting, not by authentication. - OWASP CRS in Block mode from day one. Rejected: false-positive-prone on bearer-token JSON POSTs; Log-then-tune is the correct sequencing.
- Cloudflare Access plugin for the Pages Function JWKS. Rejected: it validates CF-issued tokens, not Supabase-issued.
- A dedicated API gateway in front of
api.. Rejected for alpha: the proxied edge already provides rate limiting, L7 DDoS, and WAF.
Consequences
- A single edge substrate for DNS, TLS, Pages, and edge security.
api.’s WAF and per-key rate limiting are a property of the Cloudflare edge, independent of the origin substrate.- Bot Fight Mode must stay off on
api.— it blocks programmatic and CI callers; human-portal bot protection, if wanted, is a scoped WAF rule. - Pages custom-domain certificate issuance can stall; cut over
app./ops./docs./codex.ahead of any launch milestone.