Edge runbook
Operational runbook for Spectral’s edge / CDN / DNS posture. Covers DNS zone
- registrar, TLS mode and activation order, per-hostname proxy state, edge rules, cache invalidation discipline, blue/green cutover mechanism, Pages Function JWKS architecture, and revisit triggers. See ADR-052 for the full rationale.
Edge substrate
Cloudflare provides:
- DNS zone management for
runspectral.com - TLS termination on proxied hostnames
- Pages hosting for the two docs sites (
docs-user,docs-codex) - Pages Function JWKS-local auth gate on
codex-staging./codex. - Blue/green cutover via CNAME flip on the public hostname
Cloudflare DNS is authoritative. Registrar is Cloudflare. runspectral.com
sits in the company-shape Cloudflare account.
DNS zone + registrar
runspectral.com is registered through Cloudflare Registrar with DNSSEC
enabled. Cloudflare nameservers are mandatory on Free/Pro plans; that
constraint is acceptable at alpha.
Account placement matters: a domain in a Cloudflare account cannot be moved to a different Cloudflare account without a 60-day external-transfer cycle (transfer out, observe ICANN lock, transfer back in). The company-shape account is therefore the long-term home; nothing about the registrar lives on a personal Cloudflare account.
The declarative DNS record plan is at
infra/cloudflare/zone-records.md.
TLS posture
Cloudflare TLS mode is Full, not Full (strict). Render’s own documentation recommends Full because Full (strict) returns intermittent HTTP 526 during Render’s certificate rotation window. The trust concession is bounded; the incident-during-rotation cost is not.
Cloudflare Universal SSL covers the leaf certificates on proxied hostnames. Render manages origin certificates on its custom-domain service.
Activation order on a new custom domain
Order matters because Cloudflare’s proxy interferes with Render’s ACME challenge:
- Add the custom domain on the Render service.
- Add the corresponding DNS record in Cloudflare with proxy state DNS-only (grey-cloud).
- Wait for Render to report the certificate as issued.
- Flip the proxy state to Proxied (orange-cloud) only if the per-hostname posture in Per-hostname proxy state says so.
CAA / AAAA hygiene
- No CAA record on the zone unless explicitly whitelisting Google
Trust Services (Render issues via
pki.goog). A Let’s-Encrypt-only CAA silently breaks Render renewals. - No AAAA record on Render-hosted hostnames. Render does not support IPv6 on custom domains; an AAAA record produces intermittent failures.
Per-hostname proxy state
| Hostname | Substrate | Proxy state | Reason |
|---|---|---|---|
app.runspectral.com | Render web service (dashboard) | Proxied | SSR; cookies + DDoS shield |
app-staging.runspectral.com | Render web service (dashboard) | Proxied | Mirror of prod |
ops.runspectral.com | Render web service (operations) | Proxied | SSR + JWKS-local auth gate |
ops-staging.runspectral.com | Render web service (operations) | Proxied | Mirror of prod |
api.runspectral.com | Render web service (FastAPI) | DNS-only | SSE per ADR-032 D8; 100 MB upload ceiling on Free/Pro; OWASP CRS false-positives on JWT-bearer JSON POSTs |
api-staging.runspectral.com | Render web service (FastAPI) | DNS-only | Mirror of prod |
docs.runspectral.com | Cloudflare Pages (docs-user) | Proxied (Pages-managed) | Static; CDN value direct |
docs-staging.runspectral.com | Cloudflare Pages (docs-user) | Proxied (Pages-managed) | Mirror of prod |
codex.runspectral.com | Cloudflare Pages + Function (docs-codex) | Proxied (Pages-managed) | Pages Function executes at edge |
codex-staging.runspectral.com | Cloudflare Pages + Function (docs-codex) | Proxied (Pages-managed) | Mirror of prod |
api* is DNS-only for three concrete reasons:
- The 100-second proxy idle timeout closes long-running SSE connections,
and ADR-032 D8 makes Realtime-via-
sse-starlettepart of the API contract. - The 100 MB request body cap on Free/Pro plans limits future upload surfaces.
- The OWASP Core Rule Set’s default JSON-body inspection rules (942200, 942260) flag legitimate JWT-bearer JSON POSTs as SQL-injection attempts.
The trade is loss of Cloudflare L7 DDoS, WAF, and Bot Management on the
API surface, plus exposure of the Render origin IP. Render’s baseline L3/L4
filtering remains. There is no unauthenticated public endpoint on api.
in 0.3.0 — auth lives on the Supabase Auth substrate, not on api..
Edge rules
| Layer | Posture |
|---|---|
| Cloudflare Managed Ruleset | ON (default rules) |
| OWASP Core Rule Set | OFF (or Log mode) |
| Bot Fight Mode | OFF |
| Rate Limiting | One rule on app.*/auth/* and ops.*/auth/* callback paths, IP-based, threshold tuned at first burn-in |
| API Shield JWT validation | OFF |
Bot Fight Mode is OFF because the Free-plan implementation ignores WAF Skip rules — a known-good auth callback cannot be exempted, so BFM cannot coexist with PKCE callbacks on Free. Super Bot Fight Mode (Pro+) respects Skip rules; that is the upgrade path.
OWASP CRS at alpha is too aggressive for JSON-body API traffic, but
since api. is DNS-only (D4) the CRS posture is moot for the API. The
posture above applies to proxied hostnames only.
JWT validation lives in FastAPI middleware on the API and in the
docs-codex Pages Function on Codex; duplicating it at the edge buys
nothing at alpha.
Cache invalidation discipline
- Zero zone-level Cache Rules at alpha. Pages projects use built-in purge-on-deploy.
- Auth-sensitive SSR responses set
Cache-Control: private, no-store. An auth-refresh response cached on the edge is a cross-user session-poisoning vector. Contract test on Start + API enforces this. - Manual
purge_everythingonly on observed stale-asset reports, scoped to the specific zone. Not a routine post-deploy step.
Blue/green cutover (CNAME flip)
Public hostnames for the three Render web services (app., ops.,
api.) point at the active color via a CNAME with 60-second TTL.
Cutover after a green deploy completes:
- Render’s per-service health checks gate green’s readiness.
- Deploy verification calls
/versionagainst green directly until the reported schema generation matches the post-migration target (per ADR-046 D8). - CI workflow updates the public CNAME (e.g.,
app.runspectral.com→dashboard-green.onrender.com). - The 60s TTL bounds the cutover window for cached resolvers.
- Blue stays warm for the legacy-drain window (per ADR-046 D8 + ADR-053).
Generation-stamping (per ADR-046 D8) holds correctness at the event queue layer; HTTP requests that land on either origin during the cutover window produce identical results, so no session affinity is required.
For staging (single-color), the CNAME points at the single-color origin and is rewritten on color rotation in CI.
Pages Function JWKS architecture (codex)
docs-codex runs a Pages Function that validates getClaims() against
the Supabase project’s JWKS endpoint (Pattern A JWKS-local, per
ADR-046 D9). The Function:
- Caches JWKS in a Cloudflare KV namespace bound to the Pages project.
- Uses a 10-minute TTL on the cache entry.
- On
kidmiss, bypasses cache and refetches — Supabase rotates bykid, not by time, so a pure TTL would return “unknown kid” until expiration. - Verifies via Hono’s
jwkmiddleware (or an equivalent hand-rolled verifier). Cloudflare Access Plugin is not used; Access validates CF-issued tokens, not Supabase-issued.
In-memory caching across isolate invocations is unreliable on Pages Free; KV is the contracted surface.
The Pages Function gates SCOPE_*_OPERATIONS access to codex.. A
contract test in the API + Start integration suite asserts parity
between the Pages Function’s JWKS validation outputs and the FastAPI
middleware’s outputs on the same inputs.
Cookie + header discipline
- Session cookie scope is
runspectral.comeTLD+1 (per ADR-039 D6 and ADR-046 D8). The cookie is visible to every subdomain (app.,ops.,codex.,docs.). - Auth-refresh responses set
Cache-Control: private, no-store. - Session cookies split into multiple
sb-...cookies when the access token exceeds 4096 bytes (Azure / Google OAuth large claims).@supabase/supabase-jshandles the split client-side. The Pages Function reassembles before validation. - Supabase JWTs with custom claims can exceed Cloudflare’s upstream header buffer in some configurations. The first integration pass (per ADR-045 D13) exercises end-to-end JWT delivery through the Cloudflare proxy to surface this early.
Revisit triggers
Hard (open new spike)
- Cloudflare outage materially impacting auth or the app subdomain
- Pages cert provisioning stall > 72h on a critical-path cutover
- Cloudflare acquired by an entity that triggers a reputation concern
Soft (evaluate without committing to move)
- First L7 DDoS attempt observed in logs → re-proxy
api. - Bot-traffic noise in request logs → re-proxy + Super Bot Fight Mode evaluation (requires Pro+ plan)
- Sustained traffic ≥ 10k req/min → upgrade blue/green to Worker-based weighted routing (gradual ramp becomes statistically meaningful)
- Move to Pro / Business plan → revisit OWASP CRS, Rate Limiting by response code, BFM
- SLA-grade health-driven auto-failover required between blue/green → Cloudflare Load Balancing evaluation
Related
- ADR-052 — edge / CDN / DNS doctrine.
- ADR-046 — alpha hosting choice.
- ADR-048 — deployment topology + generation-stamping.
- ADR-039 — Supabase Auth + JWKS-local pattern.
- ADR-032 — storage topology (SSE via
sse-starlette). infra/cloudflare/zone-records.md— declarative DNS record plan.docs/runbooks/hosting.md— per-deployable hosting map and rollback procedure.docs/runbooks/secrets-management.md— Render Environment Group rotation and audit.