Skip to content
GitHub
Operator

Edge runbook

Operational runbook for Spectral’s edge / CDN / DNS posture. Covers DNS zone

  • registrar, TLS mode and activation order, per-hostname proxy state, edge rules, cache invalidation discipline, blue/green cutover mechanism, Pages Function JWKS architecture, and revisit triggers. See ADR-052 for the full rationale.

Edge substrate

Cloudflare provides:

  • DNS zone management for runspectral.com
  • TLS termination on proxied hostnames
  • Pages hosting for the two docs sites (docs-user, docs-codex)
  • Pages Function JWKS-local auth gate on codex-staging. / codex.
  • Blue/green cutover via CNAME flip on the public hostname

Cloudflare DNS is authoritative. Registrar is Cloudflare. runspectral.com sits in the company-shape Cloudflare account.

DNS zone + registrar

runspectral.com is registered through Cloudflare Registrar with DNSSEC enabled. Cloudflare nameservers are mandatory on Free/Pro plans; that constraint is acceptable at alpha.

Account placement matters: a domain in a Cloudflare account cannot be moved to a different Cloudflare account without a 60-day external-transfer cycle (transfer out, observe ICANN lock, transfer back in). The company-shape account is therefore the long-term home; nothing about the registrar lives on a personal Cloudflare account.

The declarative DNS record plan is at infra/cloudflare/zone-records.md.

TLS posture

Cloudflare TLS mode is Full, not Full (strict). Render’s own documentation recommends Full because Full (strict) returns intermittent HTTP 526 during Render’s certificate rotation window. The trust concession is bounded; the incident-during-rotation cost is not.

Cloudflare Universal SSL covers the leaf certificates on proxied hostnames. Render manages origin certificates on its custom-domain service.

Activation order on a new custom domain

Order matters because Cloudflare’s proxy interferes with Render’s ACME challenge:

  1. Add the custom domain on the Render service.
  2. Add the corresponding DNS record in Cloudflare with proxy state DNS-only (grey-cloud).
  3. Wait for Render to report the certificate as issued.
  4. Flip the proxy state to Proxied (orange-cloud) only if the per-hostname posture in Per-hostname proxy state says so.

CAA / AAAA hygiene

  • No CAA record on the zone unless explicitly whitelisting Google Trust Services (Render issues via pki.goog). A Let’s-Encrypt-only CAA silently breaks Render renewals.
  • No AAAA record on Render-hosted hostnames. Render does not support IPv6 on custom domains; an AAAA record produces intermittent failures.

Per-hostname proxy state

HostnameSubstrateProxy stateReason
app.runspectral.comRender web service (dashboard)ProxiedSSR; cookies + DDoS shield
app-staging.runspectral.comRender web service (dashboard)ProxiedMirror of prod
ops.runspectral.comRender web service (operations)ProxiedSSR + JWKS-local auth gate
ops-staging.runspectral.comRender web service (operations)ProxiedMirror of prod
api.runspectral.comRender web service (FastAPI)DNS-onlySSE per ADR-032 D8; 100 MB upload ceiling on Free/Pro; OWASP CRS false-positives on JWT-bearer JSON POSTs
api-staging.runspectral.comRender web service (FastAPI)DNS-onlyMirror of prod
docs.runspectral.comCloudflare Pages (docs-user)Proxied (Pages-managed)Static; CDN value direct
docs-staging.runspectral.comCloudflare Pages (docs-user)Proxied (Pages-managed)Mirror of prod
codex.runspectral.comCloudflare Pages + Function (docs-codex)Proxied (Pages-managed)Pages Function executes at edge
codex-staging.runspectral.comCloudflare Pages + Function (docs-codex)Proxied (Pages-managed)Mirror of prod

api* is DNS-only for three concrete reasons:

  1. The 100-second proxy idle timeout closes long-running SSE connections, and ADR-032 D8 makes Realtime-via-sse-starlette part of the API contract.
  2. The 100 MB request body cap on Free/Pro plans limits future upload surfaces.
  3. The OWASP Core Rule Set’s default JSON-body inspection rules (942200, 942260) flag legitimate JWT-bearer JSON POSTs as SQL-injection attempts.

The trade is loss of Cloudflare L7 DDoS, WAF, and Bot Management on the API surface, plus exposure of the Render origin IP. Render’s baseline L3/L4 filtering remains. There is no unauthenticated public endpoint on api. in 0.3.0 — auth lives on the Supabase Auth substrate, not on api..

Edge rules

LayerPosture
Cloudflare Managed RulesetON (default rules)
OWASP Core Rule SetOFF (or Log mode)
Bot Fight ModeOFF
Rate LimitingOne rule on app.*/auth/* and ops.*/auth/* callback paths, IP-based, threshold tuned at first burn-in
API Shield JWT validationOFF

Bot Fight Mode is OFF because the Free-plan implementation ignores WAF Skip rules — a known-good auth callback cannot be exempted, so BFM cannot coexist with PKCE callbacks on Free. Super Bot Fight Mode (Pro+) respects Skip rules; that is the upgrade path.

OWASP CRS at alpha is too aggressive for JSON-body API traffic, but since api. is DNS-only (D4) the CRS posture is moot for the API. The posture above applies to proxied hostnames only.

JWT validation lives in FastAPI middleware on the API and in the docs-codex Pages Function on Codex; duplicating it at the edge buys nothing at alpha.

Cache invalidation discipline

  • Zero zone-level Cache Rules at alpha. Pages projects use built-in purge-on-deploy.
  • Auth-sensitive SSR responses set Cache-Control: private, no-store. An auth-refresh response cached on the edge is a cross-user session-poisoning vector. Contract test on Start + API enforces this.
  • Manual purge_everything only on observed stale-asset reports, scoped to the specific zone. Not a routine post-deploy step.

Blue/green cutover (CNAME flip)

Public hostnames for the three Render web services (app., ops., api.) point at the active color via a CNAME with 60-second TTL. Cutover after a green deploy completes:

  1. Render’s per-service health checks gate green’s readiness.
  2. Deploy verification calls /version against green directly until the reported schema generation matches the post-migration target (per ADR-046 D8).
  3. CI workflow updates the public CNAME (e.g., app.runspectral.comdashboard-green.onrender.com).
  4. The 60s TTL bounds the cutover window for cached resolvers.
  5. Blue stays warm for the legacy-drain window (per ADR-046 D8 + ADR-053).

Generation-stamping (per ADR-046 D8) holds correctness at the event queue layer; HTTP requests that land on either origin during the cutover window produce identical results, so no session affinity is required.

For staging (single-color), the CNAME points at the single-color origin and is rewritten on color rotation in CI.

Pages Function JWKS architecture (codex)

docs-codex runs a Pages Function that validates getClaims() against the Supabase project’s JWKS endpoint (Pattern A JWKS-local, per ADR-046 D9). The Function:

  • Caches JWKS in a Cloudflare KV namespace bound to the Pages project.
  • Uses a 10-minute TTL on the cache entry.
  • On kid miss, bypasses cache and refetches — Supabase rotates by kid, not by time, so a pure TTL would return “unknown kid” until expiration.
  • Verifies via Hono’s jwk middleware (or an equivalent hand-rolled verifier). Cloudflare Access Plugin is not used; Access validates CF-issued tokens, not Supabase-issued.

In-memory caching across isolate invocations is unreliable on Pages Free; KV is the contracted surface.

The Pages Function gates SCOPE_*_OPERATIONS access to codex.. A contract test in the API + Start integration suite asserts parity between the Pages Function’s JWKS validation outputs and the FastAPI middleware’s outputs on the same inputs.

  • Session cookie scope is runspectral.com eTLD+1 (per ADR-039 D6 and ADR-046 D8). The cookie is visible to every subdomain (app., ops., codex., docs.).
  • Auth-refresh responses set Cache-Control: private, no-store.
  • Session cookies split into multiple sb-... cookies when the access token exceeds 4096 bytes (Azure / Google OAuth large claims). @supabase/supabase-js handles the split client-side. The Pages Function reassembles before validation.
  • Supabase JWTs with custom claims can exceed Cloudflare’s upstream header buffer in some configurations. The first integration pass (per ADR-045 D13) exercises end-to-end JWT delivery through the Cloudflare proxy to surface this early.

Revisit triggers

Hard (open new spike)

  • Cloudflare outage materially impacting auth or the app subdomain
  • Pages cert provisioning stall > 72h on a critical-path cutover
  • Cloudflare acquired by an entity that triggers a reputation concern

Soft (evaluate without committing to move)

  • First L7 DDoS attempt observed in logs → re-proxy api.
  • Bot-traffic noise in request logs → re-proxy + Super Bot Fight Mode evaluation (requires Pro+ plan)
  • Sustained traffic ≥ 10k req/min → upgrade blue/green to Worker-based weighted routing (gradual ramp becomes statistically meaningful)
  • Move to Pro / Business plan → revisit OWASP CRS, Rate Limiting by response code, BFM
  • SLA-grade health-driven auto-failover required between blue/green → Cloudflare Load Balancing evaluation