Skip to content
GitHub
Decisions

ADR-052: Edge / CDN / DNS — Cloudflare for proxy + DNS + Pages, CNAME-flip blue/green, `api.` grey-cloud

Status: Accepted (2026-04-24)

Context

Cloudflare was already in the stack from ADR-046 D5 (blue/green routing) and ADR-048 D3 (Pages for docs, Pages Function JWKS auth on codex.). TA-22 codified the surrounding posture: registrar + DNS, TLS mode, proxied vs DNS-only per record, edge rules, cache invalidation, blue/green routing mechanism, Pages Function JWKS implementation. An adversarial pass surfaced material gotchas requiring D4 / D5 to deviate from naive defaults (notably the api. grey-cloud carve-out and the CNAME-flip simplification of ADR-046 D5’s “Cloudflare weighted routing” wording).

runspectral.com was on a third-party registrar with external DNS at the time of disposition. Manual transfer to a Cloudflare-managed account is the target (operational follow-up tracked in SPEC-330).

Decision

D1 — Cloudflare confirmed as edge substrate

DNS zone management; TLS termination on proxied hostnames; Pages hosting (docs-user, docs-codex); CNAME-flip blue/green routing for prod web services. No new substrate; this confirms ADR-046 D5.

D2 — Cloudflare Registrar as target state; sequenced transfer

  1. Provision the company Cloudflare account first (not personal — once a domain is in a CF account, account-to-account moves require a 60-day external-transfer cycle).
  2. Add runspectral.com zone in Cloudflare DNS.
  3. Update nameservers at the current registrar to Cloudflare’s. Propagate ≥48 h with all records validated end-to-end.
  4. Initiate registrar transfer; allow 5–7 days for ICANN handoff.
  5. Standard 60-day post-transfer ICANN lock; nothing to do.

Cloudflare Registrar is at-cost-renewal, no markup, supports DNSSEC, and removes a third party from the cert + DNS chain. Operational follow-up tracked in SPEC-330 (Urgent).

D3 — TLS mode: Full (not Full strict)

Per Render’s own documentation. Full strict produces intermittent HTTP 526 during Render cert rotation; the trust-on-rotation tradeoff is asymmetric vs an incident-during-rotation. Operational hygiene:

  • No CAA record unless explicitly whitelisting Google Trust Services (Render uses pki.goog). A Let’s-Encrypt-only CAA silently breaks Render renewals.
  • No AAAA record on Render hostnames — Render does not support IPv6 on custom domains.
  • Custom-domain activation order: the record stays grey-cloud until Render reports cert issued; orange-cloud flip happens only after.

D4 — Proxied vs DNS-only per record

HostnameCloudReason
app.runspectral.com (Render)OrangeSSR; cookies + DDoS shield; no SSE on dashboard
ops.runspectral.com (Render)OrangeSame as app + JWKS-local auth gate
docs.runspectral.com (Pages)Orange (Pages-managed)Static; CDN value direct
codex.runspectral.com (Pages + Function)Orange (Pages-managed)Function lives at edge
api.runspectral.com (Render)GreySSE per ADR-034 D2 (100 s idle proxy timeout); 100 MB upload ceiling on Free/Pro; OWASP CRS false-positives on JWT-bearer JSON POSTs

Staging mirrors: app-staging., ops-staging., docs-staging., codex-staging. orange; api-staging. grey.

Tradeoff for api. grey-cloud: forfeit Cloudflare L7 DDoS / WAF / Bot Management on the API; Render origin IP is exposed. Mitigated by Render baseline L3/L4 and the absence of any unauthenticated public endpoint on api. in 0.3.0 (auth lives on the Supabase Auth substrate).

Revisit triggers (re-proxy api.): first observed L7 DDoS attempt; first bot-traffic noise in request logs; first endpoint where edge caching buys a real latency win.

D5 — Blue/green cutover via CNAME flip (short TTL)

Refines ADR-046 D5 — generic “Cloudflare weighted routing” → specific CNAME flip with 60 s TTL, validated via /version pre-flip.

Mechanism:

  1. Deploy green; Render per-service health checks gate readiness.
  2. Deploy verification calls /version against green directly until it reports the expected post-migration schema (per ADR-048 D5).
  3. CI workflow updates the public CNAME (e.g., app.runspectral.comdashboard-green.onrender.com).
  4. 60 s TTL bounds the cutover window.
  5. Blue stays warm for the legacy-drain window (ADR-048 D5 + ADR-053).

No session affinity needed: ADR-048 D5 generation-stamping holds correctness at the event-queue layer; HTTP requests landing on either origin produce identical results.

Forward triggers (upgrade to Worker-based weighted routing): sustained ~10k req/min where a 5% canary observes signal in ≤5 min; design-partner contract requires progressive-delivery posture. Cloudflare LB only if SLA-grade health-driven auto-failover is required, which Render’s per-service health checks already cover.

docs/runbooks/hosting.md rollback step 1 reflects the CNAME-flip mechanism; ADR-053 deploy workflow orchestrates a CNAME-flip step, not LB weight change.

D6 — Edge-rules posture at alpha

  • Cloudflare Managed Ruleset: ON (default rules)
  • OWASP Core Rule Set: OFF (or Log mode if kept — known-aggressive on JSON POSTs)
  • Bot Fight Mode: OFF (Free-plan BFM ignores WAF Skip rules — incompatible with auth callbacks)
  • Rate Limiting: one rule on app.*/auth/* and ops.*/auth/* callback paths, IP-based, threshold tuned at first burn-in (Free-plan: IP-counting only)
  • API Shield JWT validation: OFF — JWTs validated in FastAPI middleware + Pages Function; not duplicated at edge

Forward triggers: OWASP CRS in Log mode → Block once API patterns burn in; BFM → Super BFM (Pro+) when off Free; Rate Limiting by response code when on Business+.

D7 — Cache invalidation discipline

  • Zero zone-level Cache Rules at alpha. Pages projects use built-in purge-on-deploy.
  • Dashboard / operations SSR auth-sensitive paths set Cache-Control: private, no-store — contract test enforces (auth-refresh response must never be cacheable; cross-user session-poisoning vector if it is).
  • Manual purge_everything only on observed stale-asset reports, scoped to the specific zone. Not a routine post-deploy step.

D8 — Pages Function JWKS caching via KV

docs-codex Pages Function validates JWTs (Pattern A JWKS-local per ADR-046 D9):

  • JWKS cache: Cloudflare KV, 10-minute TTL
  • kid miss → bypass cache + refetch (Supabase rotates by kid, not by time; pure-TTL would return “unknown kid” until expiry)
  • Verifier: hand-rolled or Hono jwk middleware. Not Cloudflare Access Plugin (that validates CF-issued tokens, not Supabase-issued)

In-memory caching across isolate invocations is unreliable on Pages Free; KV is the contracted surface. Contract test in the API + Operations Start integration suite asserts parity between the Pages Function’s JWKS validation outputs and the FastAPI middleware’s outputs on the same JWT inputs.

  • Cookie scope: runspectral.com eTLD+1 (confirms ADR-046 D8 + ADR-039 D6).
  • Auth-refresh Cache-Control: private, no-store — contract test on Start + API.
  • PKCE-token cookie split handling: session cookies split across multiple sb-... cookies when token > 4096 bytes (Azure / Google OAuth large claims). @supabase/supabase-js handles the split client-side; Pages Function reassembly path validated in the ADR-045 D13 first-integration pass.
  • JWT header-size watch: Supabase JWTs with custom claims can exceed Cloudflare upstream header buffer in some configurations. Surfaces as a test in the ADR-045 D13 first-integration validation.

D10 — Revisit triggers (compact)

Hard (open new spike):

  • Cloudflare outage materially impacting auth or app subdomain
  • Pages cert provisioning stall > 72 h on critical-path cutover
  • Cloudflare acquired by entity triggering reputation concern

Soft (evaluate):

  • First L7 DDoS attempt → re-proxy api.
  • Bot-traffic noise in logs → re-proxy + Super BFM evaluation
  • ≥10k req/min sustained → upgrade blue/green to Worker-based weighted routing
  • Move to Pro/Business → revisit OWASP CRS, Rate Limiting by response code, BFM
  • Health-driven auto-failover required → Cloudflare LB evaluation

Alternatives considered

Orange-cloud api. Rejected; SSE / 100 MB / OWASP-FP cluster too concrete.

Cloudflare LB for blue/green at alpha. Rejected; pays $5–10/month + control surface for guarantees Render health checks + ADR-048 D5 already provide.

Worker-based weighted routing at alpha. Rejected as premature; low traffic produces no canary signal.

DNS-weighted records as a free primitive. Cloudflare native DNS does not actually offer weighted records on Free; CNAME flip is the honest free-tier choice.

Full (strict) TLS. Rejected per Render guidance; 526s during cert rotation.

OWASP CRS Block mode at alpha. Rejected; default-blocks legitimate JWT-bearer JSON POSTs.

Bot Fight Mode ON at alpha. Rejected; Free-plan BFM cannot be Skip-bypassed for known-good auth callbacks.

In-memory JWKS cache in Pages Function. Rejected; isolate lifetime non-deterministic on Pages Free.

Cloudflare Access Plugin for JWKS validation. Rejected; validates CF-issued tokens, not Supabase-issued.

API Shield JWT validation at edge. Deferred; redundant with FastAPI middleware at alpha.

Consequences

  • Single edge substrate for DNS, TLS, Pages, edge rules, blue/green.
  • Three concrete API breakages (SSE / upload / JSON-POST) avoided by api. grey-cloud at zero spend.
  • Blue/green cutover is one CNAME edit; no new control surface.
  • Edge-rules posture at alpha is “do nothing aggressive”; opt into risk only with revisit triggers.
  • api. grey-cloud forfeits Cloudflare L7 DDoS / WAF / Bot Management — Render baseline only; origin IP exposed; material only if specifically targeted.
  • No weighted canary at alpha — CNAME flip is all-or-nothing; pre-flip /version validation is the safety net.
  • Pages custom-domain cert provisioning has a known-stall failure mode (5+ day reports). Mitigation: cut over docs. and codex. ≥7 days ahead of any soft-launch milestone; manual support escalation if stall observed at 72 h.

References

  • ADR-065spectral.core admission discipline (no surface added here)
  • ADR-034 — SSE via sse-starlette informs api. grey-cloud
  • ADR-039 — JWKS-local validation pattern (reused at edge)
  • ADR-045 — D13 first-integration validation (PKCE split + JWT header size)
  • ADR-046 — Render PaaS; cookie scope; Pattern A
  • ADR-047 — operations subdomain
  • ADR-048 — generation stamping + Pages docs
  • ADR-053 — CNAME-flip step in cutover sequence
  • ADR-062 — CF API token Environment scoping
  • TA-22 disposition — SPEC-325 comment 63258a67
  • TA-22 verification — SPEC-325 comment 769e828a
  • docs/runbooks/edge.md — edge runbook (commit 3fc271e)
  • infra/cloudflare/zone-records.md — declarative DNS plan
  • Codex system-design/edge-architecture.mdx — close-pass new page
  • SPEC-330 — operational follow-up (registrar + DNS + edge configuration)