ADR-052: Edge / CDN / DNS — Cloudflare for proxy + DNS + Pages, CNAME-flip blue/green, `api.` grey-cloud
Status: Accepted (2026-04-24)
Context
Cloudflare was already in the stack from ADR-046 D5 (blue/green routing) and ADR-048 D3 (Pages for docs, Pages Function JWKS auth on codex.). TA-22 codified the surrounding posture: registrar + DNS, TLS mode, proxied vs DNS-only per record, edge rules, cache invalidation, blue/green routing mechanism, Pages Function JWKS implementation. An adversarial pass surfaced material gotchas requiring D4 / D5 to deviate from naive defaults (notably the api. grey-cloud carve-out and the CNAME-flip simplification of ADR-046 D5’s “Cloudflare weighted routing” wording).
runspectral.com was on a third-party registrar with external DNS at the time of disposition. Manual transfer to a Cloudflare-managed account is the target (operational follow-up tracked in SPEC-330).
Decision
D1 — Cloudflare confirmed as edge substrate
DNS zone management; TLS termination on proxied hostnames; Pages hosting (docs-user, docs-codex); CNAME-flip blue/green routing for prod web services. No new substrate; this confirms ADR-046 D5.
D2 — Cloudflare Registrar as target state; sequenced transfer
- Provision the company Cloudflare account first (not personal — once a domain is in a CF account, account-to-account moves require a 60-day external-transfer cycle).
- Add
runspectral.comzone in Cloudflare DNS. - Update nameservers at the current registrar to Cloudflare’s. Propagate ≥48 h with all records validated end-to-end.
- Initiate registrar transfer; allow 5–7 days for ICANN handoff.
- Standard 60-day post-transfer ICANN lock; nothing to do.
Cloudflare Registrar is at-cost-renewal, no markup, supports DNSSEC, and removes a third party from the cert + DNS chain. Operational follow-up tracked in SPEC-330 (Urgent).
D3 — TLS mode: Full (not Full strict)
Per Render’s own documentation. Full strict produces intermittent HTTP 526 during Render cert rotation; the trust-on-rotation tradeoff is asymmetric vs an incident-during-rotation. Operational hygiene:
- No CAA record unless explicitly whitelisting Google Trust Services (Render uses
pki.goog). A Let’s-Encrypt-only CAA silently breaks Render renewals. - No AAAA record on Render hostnames — Render does not support IPv6 on custom domains.
- Custom-domain activation order: the record stays grey-cloud until Render reports cert issued; orange-cloud flip happens only after.
D4 — Proxied vs DNS-only per record
| Hostname | Cloud | Reason |
|---|---|---|
app.runspectral.com (Render) | Orange | SSR; cookies + DDoS shield; no SSE on dashboard |
ops.runspectral.com (Render) | Orange | Same as app + JWKS-local auth gate |
docs.runspectral.com (Pages) | Orange (Pages-managed) | Static; CDN value direct |
codex.runspectral.com (Pages + Function) | Orange (Pages-managed) | Function lives at edge |
api.runspectral.com (Render) | Grey | SSE per ADR-034 D2 (100 s idle proxy timeout); 100 MB upload ceiling on Free/Pro; OWASP CRS false-positives on JWT-bearer JSON POSTs |
Staging mirrors: app-staging., ops-staging., docs-staging., codex-staging. orange; api-staging. grey.
Tradeoff for api. grey-cloud: forfeit Cloudflare L7 DDoS / WAF / Bot Management on the API; Render origin IP is exposed. Mitigated by Render baseline L3/L4 and the absence of any unauthenticated public endpoint on api. in 0.3.0 (auth lives on the Supabase Auth substrate).
Revisit triggers (re-proxy api.): first observed L7 DDoS attempt; first bot-traffic noise in request logs; first endpoint where edge caching buys a real latency win.
D5 — Blue/green cutover via CNAME flip (short TTL)
Refines ADR-046 D5 — generic “Cloudflare weighted routing” → specific CNAME flip with 60 s TTL, validated via /version pre-flip.
Mechanism:
- Deploy green; Render per-service health checks gate readiness.
- Deploy verification calls
/versionagainst green directly until it reports the expected post-migration schema (per ADR-048 D5). - CI workflow updates the public CNAME (e.g.,
app.runspectral.com→dashboard-green.onrender.com). - 60 s TTL bounds the cutover window.
- Blue stays warm for the legacy-drain window (ADR-048 D5 + ADR-053).
No session affinity needed: ADR-048 D5 generation-stamping holds correctness at the event-queue layer; HTTP requests landing on either origin produce identical results.
Forward triggers (upgrade to Worker-based weighted routing): sustained ~10k req/min where a 5% canary observes signal in ≤5 min; design-partner contract requires progressive-delivery posture. Cloudflare LB only if SLA-grade health-driven auto-failover is required, which Render’s per-service health checks already cover.
docs/runbooks/hosting.md rollback step 1 reflects the CNAME-flip mechanism; ADR-053 deploy workflow orchestrates a CNAME-flip step, not LB weight change.
D6 — Edge-rules posture at alpha
- Cloudflare Managed Ruleset: ON (default rules)
- OWASP Core Rule Set: OFF (or Log mode if kept — known-aggressive on JSON POSTs)
- Bot Fight Mode: OFF (Free-plan BFM ignores WAF Skip rules — incompatible with auth callbacks)
- Rate Limiting: one rule on
app.*/auth/*andops.*/auth/*callback paths, IP-based, threshold tuned at first burn-in (Free-plan: IP-counting only) - API Shield JWT validation: OFF — JWTs validated in FastAPI middleware + Pages Function; not duplicated at edge
Forward triggers: OWASP CRS in Log mode → Block once API patterns burn in; BFM → Super BFM (Pro+) when off Free; Rate Limiting by response code when on Business+.
D7 — Cache invalidation discipline
- Zero zone-level Cache Rules at alpha. Pages projects use built-in purge-on-deploy.
- Dashboard / operations SSR auth-sensitive paths set
Cache-Control: private, no-store— contract test enforces (auth-refresh response must never be cacheable; cross-user session-poisoning vector if it is). - Manual
purge_everythingonly on observed stale-asset reports, scoped to the specific zone. Not a routine post-deploy step.
D8 — Pages Function JWKS caching via KV
docs-codex Pages Function validates JWTs (Pattern A JWKS-local per ADR-046 D9):
- JWKS cache: Cloudflare KV, 10-minute TTL
kidmiss → bypass cache + refetch (Supabase rotates bykid, not by time; pure-TTL would return “unknown kid” until expiry)- Verifier: hand-rolled or Hono
jwkmiddleware. Not Cloudflare Access Plugin (that validates CF-issued tokens, not Supabase-issued)
In-memory caching across isolate invocations is unreliable on Pages Free; KV is the contracted surface. Contract test in the API + Operations Start integration suite asserts parity between the Pages Function’s JWKS validation outputs and the FastAPI middleware’s outputs on the same JWT inputs.
D9 — Cookie + header discipline
- Cookie scope:
runspectral.comeTLD+1 (confirms ADR-046 D8 + ADR-039 D6). - Auth-refresh
Cache-Control: private, no-store— contract test on Start + API. - PKCE-token cookie split handling: session cookies split across multiple
sb-...cookies when token > 4096 bytes (Azure / Google OAuth large claims).@supabase/supabase-jshandles the split client-side; Pages Function reassembly path validated in the ADR-045 D13 first-integration pass. - JWT header-size watch: Supabase JWTs with custom claims can exceed Cloudflare upstream header buffer in some configurations. Surfaces as a test in the ADR-045 D13 first-integration validation.
D10 — Revisit triggers (compact)
Hard (open new spike):
- Cloudflare outage materially impacting auth or app subdomain
- Pages cert provisioning stall > 72 h on critical-path cutover
- Cloudflare acquired by entity triggering reputation concern
Soft (evaluate):
- First L7 DDoS attempt → re-proxy
api. - Bot-traffic noise in logs → re-proxy + Super BFM evaluation
- ≥10k req/min sustained → upgrade blue/green to Worker-based weighted routing
- Move to Pro/Business → revisit OWASP CRS, Rate Limiting by response code, BFM
- Health-driven auto-failover required → Cloudflare LB evaluation
Alternatives considered
Orange-cloud api. Rejected; SSE / 100 MB / OWASP-FP cluster too concrete.
Cloudflare LB for blue/green at alpha. Rejected; pays $5–10/month + control surface for guarantees Render health checks + ADR-048 D5 already provide.
Worker-based weighted routing at alpha. Rejected as premature; low traffic produces no canary signal.
DNS-weighted records as a free primitive. Cloudflare native DNS does not actually offer weighted records on Free; CNAME flip is the honest free-tier choice.
Full (strict) TLS. Rejected per Render guidance; 526s during cert rotation.
OWASP CRS Block mode at alpha. Rejected; default-blocks legitimate JWT-bearer JSON POSTs.
Bot Fight Mode ON at alpha. Rejected; Free-plan BFM cannot be Skip-bypassed for known-good auth callbacks.
In-memory JWKS cache in Pages Function. Rejected; isolate lifetime non-deterministic on Pages Free.
Cloudflare Access Plugin for JWKS validation. Rejected; validates CF-issued tokens, not Supabase-issued.
API Shield JWT validation at edge. Deferred; redundant with FastAPI middleware at alpha.
Consequences
- Single edge substrate for DNS, TLS, Pages, edge rules, blue/green.
- Three concrete API breakages (SSE / upload / JSON-POST) avoided by
api.grey-cloud at zero spend. - Blue/green cutover is one CNAME edit; no new control surface.
- Edge-rules posture at alpha is “do nothing aggressive”; opt into risk only with revisit triggers.
api.grey-cloud forfeits Cloudflare L7 DDoS / WAF / Bot Management — Render baseline only; origin IP exposed; material only if specifically targeted.- No weighted canary at alpha — CNAME flip is all-or-nothing; pre-flip
/versionvalidation is the safety net. - Pages custom-domain cert provisioning has a known-stall failure mode (5+ day reports). Mitigation: cut over
docs.andcodex.≥7 days ahead of any soft-launch milestone; manual support escalation if stall observed at 72 h.
References
- ADR-065 —
spectral.coreadmission discipline (no surface added here) - ADR-034 — SSE via sse-starlette informs
api.grey-cloud - ADR-039 — JWKS-local validation pattern (reused at edge)
- ADR-045 — D13 first-integration validation (PKCE split + JWT header size)
- ADR-046 — Render PaaS; cookie scope; Pattern A
- ADR-047 — operations subdomain
- ADR-048 — generation stamping + Pages docs
- ADR-053 — CNAME-flip step in cutover sequence
- ADR-062 — CF API token Environment scoping
- TA-22 disposition — SPEC-325 comment
63258a67 - TA-22 verification — SPEC-325 comment
769e828a docs/runbooks/edge.md— edge runbook (commit3fc271e)infra/cloudflare/zone-records.md— declarative DNS plan- Codex
system-design/edge-architecture.mdx— close-pass new page - SPEC-330 — operational follow-up (registrar + DNS + edge configuration)