ADR-073: Provisioning orchestrator — OpenTofu, setup.sh, 1Password, manual_step
Status: Accepted (2026-05-06) Supersedes: ADR-037 D5 (partial — names 1Password as the documented default operator-workstation secrets store, rather than treating tool choice as private), ADR-037 D9
Context
ADR-037 chose a deployer-operated provisioning script with a target-swap seam as the alpha-stage secrets-management substrate. D9 explicitly framed declarative IaC (Terraform / Pulumi) as post-alpha — the script bridges manual console clicks to declarative infra at alpha scale.
That framing assumed declarative IaC required a maturity threshold the alpha team had not crossed: stable target implementations, a team larger than three engineers, the operational appetite for HCL and state management. Operator practice and a focused research pass have shifted the calculus:
-
Provisioning surface has multiplied. Cloudflare Pages projects (×2), KV namespaces, Access apps and policies, DNS records, WAF, rate-limit rules, zone settings; GitHub Environments, branch protection, repo settings, Actions secrets; Render web / cron / worker services, env groups, custom domains; Supabase project + auth + custom domains + edge functions; the new R2 buckets per ADR-072. Each adds idempotency, drift-detection, and re-runnable-from-empty surface. Building all of it bespoke in bash compounds engineering cost across every deployable epic.
-
OpenTofu provider coverage is where it needs to be. Cloudflare and GitHub providers are mature, official, and cover the resource shapes Spectral needs end-to-end. Render’s official provider has known plan-noise and env-group bugs but is workable with
lifecycle.ignore_changesdiscipline and single-root ownership. Supabase’s official provider has permanent gaps for OAuth provider config, RLS policies, DB migrations, and Vault — all CLI-shaped operations that should not be TF-shaped anyway. -
Hybrid is the right shape. Roughly 70% of resources land natively in OpenTofu; 20% require CLI gap-fills (Supabase OAuth via API PATCH; RLS and migrations via
supabase db push); 10% require human-in-loop steps (account-level OAuth grants and similar). The script orchestrates all three. -
The secrets cache is genuine attack surface. ADR-037 D4’s
.env.provisionis a plaintext file at rest on the operator machine. 1Password’sop runinjects secret references as env vars at apply time without the values touching disk. Eliminating the on-disk plaintext for the documented default path is a meaningful security improvement. -
Naming the default tool requires a partial supersession of ADR-037 D5. D5 kept specific upstream key-source tooling private to avoid cofounder-personal preferences leaking into system docs. Naming 1Password as the documented default for operator workstations changes that posture: it is now a published recommendation, not personal practice. The privacy concern is satisfied differently — by recommending a default for any operator, not by hiding which tool a particular operator uses.
-
Contributor optionality must be preserved. Locking provisioning into a single password manager harms contributors who use Bitwarden, KeePass, 1Password Teams via a different account, or no password manager at all. The fallback path (
.env.provision) survives as an explicit second-class option, with the security caveat made plain in operator docs.
Decision
D1 — tools/provision/setup.sh is the operator’s single entry point for environment provisioning
The script orchestrates three layers of work: declarative resource provisioning via OpenTofu (D2), CLI gap-fills for resources without OpenTofu coverage (D3), and human-in-loop manual steps with shell-command verification (D4). The provider-swap-seam discipline from ADR-037 D11 stays — push_<target>(env, name, value) for secret push — and is joined by a parallel provision_<resource>(...) family for non-secret resources, plus the manual_step helper.
This extends ADR-037 D4 in scope; the modes (init, update, rotate, verify), scope annotation pattern, and cache discipline are unchanged.
D2 — OpenTofu is the declarative resource-provisioning layer
Per-provider stack roots under infra/tofu/: cloudflare/, github/, render/, supabase/, r2/. Each plans and applies independently. Cross-stack values flow via outputs and terraform_remote_state data sources.
Conventions, locked at the base-structure sub-issue of the provisioning-orchestrator epic:
- Per-stack-root
variables.tfis the operator’s interface;locals.tffor derived values. - Repetitive resources are driven by data files (YAML / JSON +
for_each) where appropriate. The currentinfra/cloudflare/zone-records.mdmarkdown table becomes a YAML data file consumed by a singlecloudflare_dns_recordfor_eachresource. - Modules for genuine repeats only (e.g. a
pages_sitemodule instantiated for the two docs sites). Not module-everything. - One workspace OR one directory per environment — chosen at the base-structure sub-issue.
- State on R2 via the
s3backend per ADR-072 D2.
Bootstrap: setup.sh creates the state bucket via wrangler r2 bucket create as a pre-step on first init, idempotent (check-then-create). No bootstrap-TF-with-local-state dance.
D3 — CLI gap-fills via setup.sh for resources without OpenTofu coverage
Each gap-fill is a function in setup.sh matching the provision_<resource>(env, ...) signature. Idempotent at target (check-then-create-or-update). Drains into the same cache, verification, and manual_step framework as everything else.
Permanent CLI gap-fills (provider has no first-class TF resource):
- Supabase OAuth provider config —
curl -X PATCH /v1/projects/$REF/config/authwithexternal_google_client_id, etc. - Supabase RLS policies + DB migrations —
supabase db pushagainst the linked project. Schema is SQL-shaped, not TF-shaped. - Supabase Vault secrets (post-alpha BYOK per ADR-037 D6) — SQL migration:
select vault.create_secret(...).
Transient CLI gap-fills (provider issue expected to resolve):
- Cloudflare v5
cloudflare_pages_domain“already added” race (cloudflare/terraform-provider-cloudflare#5619). Tolerate one retry; import-then-manage if persistent. Re-evaluate at each Cloudflare provider minor.
D4 — manual_step helper for human-in-loop steps
Signature: manual_step <id> <description> <verification_command>.
- Idempotent via cache marker. Cache key
manual:<id>=done@<ISO8601>parallels the existing<scope>:<NAME>namespace. Re-runs detect the marker and skip with a one-line log. - Verification by default. The verification command must exit 0 before the cache marker is written. Examples:
dig +short NS runspectral.com | grep -q cloudflare;gh api /repos/.../environments/test-live | jq -e '.protection_rules | length > 0'. - Explicit
--unverifiableescape hatch for steps with no programmatic check. Drift-prone; documented in the runbook with a recommendation to minimize the unverifiable surface. - Each step carries a
delete_when:field naming the condition under which it can be retired (e.g., “Cloudflare provider adds resource X”). Manual steps are debt; the field captures the retirement trigger.
Initial manual steps include the Cloudflare → GitHub OAuth source-connection grant (one-time per Cloudflare account; no API surface).
D5 — 1Password is the documented default operator-workstation secrets source
The default secrets path on operator workstations is 1Password Individual or higher with the op CLI. setup.sh apply invokes op run --env-file=.env.example -- tofu apply; .env.example carries committed op://VAULT/ITEM/FIELD references. Initial population: setup.sh init prompts and writes to 1Password via op item create and op item edit rather than to a local cache file. Rotation: edit the value in 1Password; re-run apply.
Personal / Individual plan is sufficient — verified. The op CLI features used (op run, op read, op item create, op item edit) are not plan-tier-gated. Service accounts (Business plan and above) are not required, because CI uses GitHub Actions Secrets (populated by TF), not 1Password.
This partially supersedes ADR-037 D5: specific upstream key-source tooling is no longer treated as private when it is the published default. The privacy intent of D5 — avoiding cofounder-personal tooling leaking into system docs — is satisfied differently, by naming a default that is recommended for any operator rather than describing one cofounder’s preference.
D6 — .env.provision is the fallback secrets path
Auto-detected: if op CLI is unavailable or no active session resolves, the script falls back with a one-line banner. An explicit --no-1password flag opts out when op is available but the operator chooses not to use it.
Same dispatch interface (store_value, read_value); two implementations sharing one signature, dispatched by which path is active. Parity is required: any new feature works on both paths or it does not ship.
Operator discipline for the fallback path is documented in the operator runbook: chmod 600 (script-enforced), gitignored (already), back up to chosen password manager, delete the local file after backup. The “no plaintext on disk” property holds for the documented default; degrades gracefully on opt-out.
D7 — Pattern C secrets architecture: TF_VAR_* injection at apply time
Secret values flow: 1Password (or .env.provision fallback) → TF_VAR_* env vars (via op run or shell-injection from the cache) → tofu apply. TF state contains values for non-ephemeral resources; Cloudflare, GitHub, Render, and Supabase providers do not yet ship ephemeral resources (Terraform 1.10 ephemeral-value support is currently provider-side AWS / Azure / Kubernetes / Google).
Mitigations for state-stored values:
- R2 state bucket per ADR-072 D2 with server-side encryption and bucket versioning.
- Object R/W API token scoped to the state bucket only, held by the operator (not by CI).
- State bucket separate from backups bucket per ADR-072 D3 — different blast radii.
- Ephemeral values used for the
googleprovider where supported; the surface is small given ADR-072 reduces GCP usage.
D8 — Roll-out is iterative across the alpha milestone
Each deployable epic ships its own resources first (manually if needed), then a parallel sub-issue under the provisioning-orchestrator epic brings those resources under script control. Sub-issues stack up as we work through alpha; the epic closes when every alpha-required deployable can be provisioned end-to-end from setup.sh against an empty target.
Codex pages and runbooks describing “manually create X” prerequisites get swept and rewritten as the corresponding sub-issues land.
Alternatives considered
Bespoke multi-provider orchestrator entirely in bash. Reject — every capability OpenTofu provides natively (idempotent diff, drift detection, dependency graph, plan/apply pattern, audit trail) becomes engineering we own. Bespoke orchestrators require sustained engineering as targets evolve and as new providers join the stack. The break-even for IaC is roughly ≥3 providers + secrets + repeatability + multi-environment; we cleared that bar long before this ADR.
Pulumi. Reject — comparable feature set to OpenTofu, but Pulumi’s default state backend (Pulumi Cloud) requires an account and trends paid for sustained use; self-hosted state needs setup that R2 already provides. Smaller community than Terraform / OpenTofu for our specific providers.
All-CLI / no-IaC. Reject — drift detection and re-runnable-from-empty are real value; lost without a declarative layer. This is what we have today, and the friction will compound as the provisioning surface grows.
Pattern A (bash-only secrets, TF avoids secret-containing resources). Reject — loses TF drift detection on GitHub Actions secrets, Render env groups, Supabase auth. A secret rotated via dashboard becomes invisible to tofu plan.
Pattern B (secrets via external secret manager + TF data sources). Reject — adds GCP Secret Manager (or equivalent) as a new vendor at alpha; does not actually keep secret values out of state for many providers’ resources; more moving parts than the gain.
1Password mandatory; no fallback. Reject — locks contributors into a single password manager; the fallback path is small surface area to maintain and preserves contributor optionality. Per the parity requirement (D6), maintenance burden stays bounded.
Wait-and-see — defer this decision until post-alpha as ADR-037 D9 originally framed. Reject — the cost of waiting is sustained bespoke engineering across every deployable epic. The alpha-stage maturity bar that D9 cited was empirical, not principled; the empirical evidence has shifted (research confirmed scale fit; the OpenTofu skill ramp is short with current LLM tooling).
Consequences
tools/provision/setup.shscope expands from secrets-only to a multi-layer orchestrator. The provider-swap-seam discipline from ADR-037 D11 is preserved; new families (provision_<resource>,manual_step) joinpush_<target>.- New
infra/tofu/directory tree with per-provider stack roots; state on R2 via ADR-072 D2. .env.provisioncache survives as the fallback secrets path; the documented primary path is 1Password.- ADR-037 D5 partially superseded. Specific upstream key-source tooling (1Password) is named as the documented default. The privacy intent is satisfied differently.
- ADR-037 D9 superseded. Declarative IaC adopted at alpha rather than deferred to post-alpha. The script-as-IaC framing is replaced by script-as-thin-orchestrator-around-IaC.
- Hard dependency on the
opCLI for the documented default secrets path. Operator workstations needopinstalled and authenticated. The fallback path has noopdependency. - HCL becomes part of the operator skill set — small ramp; well within agent-LLM and operator capability.
- Codex sweep required for content describing the provisioning model: any “manually create X” prerequisite,
.env.provisionreferences, secrets rotation flow, runbook narratives for setup. - Per-provider provider-version pins required in TF root configurations. Locked at the base-structure sub-issue.
- The provisioning-orchestrator epic captures the work; sub-issues land iteratively across alpha.
- Trade accepted: state-at-rest contains secret values for non-ephemeral resources; mitigated by R2 SSE + IAM-scoped operator-only token + bucket versioning + separate state bucket from backups bucket per ADR-072.