Decisions

ADR-110: Provisioning by config resolution + GitHub Actions deploy — no Terraform

Context

Spectral’s provisioning surface is Cloudflare + GitHub + Supabase (R2 is Cloudflare). Cloudflare dominates, and its native tool is wrangler — wrangler.jsonc and Pages project settings manage the compute / bindings / Pages / KV / R2 surface. Deploys run in GitHub Actions (git main is integration; a fast-forward push of the production branch deploys product surfaces to prod; docs-codex keeps its existing main Pages deploy). The CI runner is where wrangler, the Supabase CLI, and gh already execute. Provisioning is greenfield — no Terraform was ever built — so the question is simply what the simplest correct provisioning model is for this surface.

Decision

D1 — No Terraform; a two-hop config flow, not a do-everything script

Provisioning is a two-hop flow:

tools/provision/provision.sh resolves an environment and publishes it. It reads the committed per-environment config, resolves secrets from 1Password, and writes the full set to that environment’s sink — the local .env for local work, or the matching GitHub Environment for staging/production. It never talks to Cloudflare or Supabase.
GitHub Actions deploys. On a deploy, the workflow reads the GitHub Environment’s variables + secrets and applies them to the live systems — wrangler for Cloudflare compute/edge/Pages, the Supabase CLI for Supabase — creating/reconciling vendor resources from committed declarative config, then ships the new container or Pages deployment.

There is no infra/tofu/ tree and no Terraform state. The CI runner already runs every vendor CLI on every deploy, so resource creation and per-vendor secret-push belong in the deploy workflow, not on a workstation; the workstation’s only job is to make an environment’s config available.

D2 — `infra/environments.toml` is the single per-environment source

Each environment is one table. Non-secret values are literals (committed — they are not secrets); secrets are op://Spectral/<item>/<field> references, resolved at use via 1Password (op), never committed. The secret boundary is lintable: a secret-shaped value MUST be an op:// ref.

D3 — `provision.sh` has one operation (resolve) and three forms

provision.sh --env <name> resolves the named environment and fails loud, naming the key, on a missing value or 1Password item. The form decides what happens with the resolved set:

--env X -- <command…> — run the command with the resolved environment in its process (the op run pattern): provision.sh --env production -- wrangler deploy. The secrets live only for that command.
--env X --dry-run — print the publish plan (each key, its sink target, secret-vs-variable), values redacted. Always safe to run.
--env X (default) — publish: resolve and write the full set to the environment’s sink, after a y/N confirmation (--yes bypasses).

There are no init/update/rotate/verify/resources/dotenv modes.

D4 — The publish sink, and the secret/variable split

An environment’s sink is either the local .env or a GitHub Environment, defaulted by env name and overridable with a PROVISION_SINK key. On a GitHub publish, an op://-sourced value is written as a GitHub secret; a literal is written as a GitHub variable. A local .env publish writes every key as a resolved KEY=value line in the gitignored file. (gh reads a secret value from stdin only when --body is omitted — never pass --body -, which would set the secret to the literal dash.)

D5 — Drift-safety is a property of the deploy, not the workstation

Every deploy re-asserts the full environment from its GitHub Environment onto the live systems and reconciles vendor resources via idempotent check-then-set from committed declarative config (wrangler.jsonc, supabase/config.toml, the DNS-records data file, the R2-buckets list). So the live config cannot drift from environments.toml + 1Password, and the deploy is safe to re-run from an empty target. This is the drift-safety Terraform would have given, kept as a deploy property on a smaller surface.

D6 — No Terraform state; DR is Supabase-native

There is no Terraform state to store (no spectral-tfstate bucket). Disaster recovery is Supabase-native managed backups + PITR (ADR-040) — there is no self-run backup pipeline and no R2 backup bucket to provision. Any genuinely manual one-time step is a documented runbook step, not a gate inside provision.sh.

Alternatives considered

Keep OpenTofu, re-baselined Cloudflare-only. Rejected: two config systems for Cloudflare (wrangler + TF) and HCL/state overhead for a surface that is small and greenfield — it pays for an abstraction the scale no longer needs.
Reconciling bash in provision.sh — one workstation script that both creates resources and pushes secrets to each vendor. Rejected: it duplicates, on a laptop, work the CI runner already does on every deploy, and splits drift-safety between the workstation and the deploy. Folding resource creation + vendor secret-push into the deploy gives one place where the live systems are asserted, and shrinks provision.sh to the one thing only the workstation can do — resolve 1Password and stage config.
All-CLI, no reconcile discipline anywhere. Rejected: write-once imperative steps lose drift-safety on high-blast-radius resources (auth, DNS, secrets). The mitigation — declared data + idempotent check-then-set — lives in the deploy workflow (D5) rather than in provision.sh.

Consequences

No infra/tofu/ tree; no spectral-tfstate bucket.
tools/provision/provision.sh is the resolve+publish tool of D3/D4 — no provision_<resource>/push_<target>/reconcile/manual_step families, no .env.example parser, no .env.provision cache.
The per-vendor reconcile + secret-push work lands in the GitHub Actions deploy workflows reading the GitHub Environment, with committed declarative config as their input. Product Pages deploys (app, ops, docs) run from the production branch; docs-codex remains a main-triggered internal-docs Pages deploy.
infra/environments.toml is the committed single source; committed .env.* are forbidden; a real .env stays gitignored.
Drift-safety is a deploy-time property (full-environment re-assertion + reconcile from declared data) rather than a Terraform-plan property — documented as load-bearing so it is not painted over.

Previous
ADR-109: Cloudflare hosting topology — one app container, predicate in-process, Supabase locked Next
Functional Specs