ADR-012: Dev tooling — Biome, git-cliff, tiered commit hooks, ruff TD rules
Status: Accepted (2026-04-20; mypy portion superseded by ADR-051 — ty is the primary Python type checker; ruff ANN family backfills missing-annotation coverage; mypy retained as informational warning in tools/dev/precheck.sh. Biome, git-cliff, tiered commit hooks, and ruff TD rules stand)
Context
The 0.3.0 rewrite is greenfield on the new repo (~/Source/OMG/spectral/). Before scaffolding lands (SPEC-296), the dev-tooling stack needs to be decided: TypeScript lint/format, release-notes automation, commit-hook strategy, and the fate of the custom check_spec_refs.py script from the legacy repo.
Four related questions sit together because they all shape what the new repo’s scaffolding ships with on day one:
- TypeScript lint + format toolchain
- CHANGELOG automation
- What runs at which commit-hook tier
- How to keep SPEC-refs out of code (prior enforcement was a custom Python script)
Decisions captured below; alternatives considered after each.
Decision
1. Biome for TypeScript lint + format
Use Biome for TypeScript/JavaScript lint and format across the monorepo. Do not use ESLint + Prettier.
Rationale:
- Speed. Biome is Rust-based and runs roughly 10–100× faster than ESLint + Prettier on the same tree. Tooling speed matters because the other hooks on this list compound; a slow formatter drags the whole pre-commit tier below its target.
- Single tool. Lint and format in one binary with one config file (
biome.json). Fewer toolchain pieces, fewer version-drift failure modes, less config scattered acrosspackage.json,.eslintrc,.prettierrc. - Fits the “fast tooling at key gates” operating principle. The dev loop depends on hook latency; Biome is picked for that constraint first, ecosystem breadth second.
Known trade-off: Biome’s plugin ecosystem is smaller than ESLint’s. For a solo-controlled codebase where style rules are decided centrally and applied uniformly, the ecosystem gap is acceptable. If the project later adopts a plugin that only exists in ESLint, revisit via a follow-up ADR.
2. git-cliff for CHANGELOG.md
Use git-cliff to generate CHANGELOG.md from Conventional Commits. Run it in CI on tag pushes via GitHub Actions (.github/workflows/release.yml). Config in cliff.toml at repo root.
This is distinct from WorldModelCard release notes (SPEC-243). CHANGELOG describes engineering / codebase changes for developers; WorldModelCard describes world-model rule changes for operators. Different artifacts, different audiences, different cadences — do not conflate them.
3. Tiered commit-hook strategy
Four tiers, each with a latency target. A tier only contains checks it can finish within its budget; anything slower gets pushed down to the next tier.
| Tier | Latency target | Contents |
|---|---|---|
| Pre-commit | < 3s | biome format / biome check, ruff format / ruff check, hygiene hooks (trailing whitespace, EOF fixer, large-file detection, private-key detection) |
| Commit-msg | negligible | commitizen — enforces Conventional Commits message format |
| Pre-push | < 30s | ty check, tsc --noEmit, tools/quality/validate_architecture.py, fast unit tests |
| CI | best-effort | everything above + integration tests + coverage thresholds + the full test matrix |
Hook behavior is fail-fast rather than auto-fix-and-stage. When a formatter would change files, the hook errors with a clear “run X to fix” message rather than silently re-staging the tree. Rationale: explicit beats magic. Auto-staging hides diffs from commit-review and creates agent-visible surprises where the committed tree differs from what the agent thought it was committing. The few seconds saved aren’t worth the confusion.
Type-checker portion superseded by ADR-051. The original disposition selected
mypy --strictfor the pre-push tier; ADR-051 replaced mypy withty(Astral’s type checker) as the primary, with ruffANNfamily backfilling missing-annotation coverage; mypy is retained as an informational warning intools/dev/precheck.sh. Tier latency target + fail-fast behavior stand.
4. Drop check_spec_refs.py; rely on ruff TD rules
The legacy repo ships a custom tools/quality/check_spec_refs.py script that blocks SPEC-refs in code. Do not port it. Instead, rely on ruff’s TD rules already enforced via pyproject.toml:
TD003— missing issue link in TODOTD004— missing author in TODOTD005— missing link in TODO
These rules mean any TODO comment must link to a tracked issue (e.g. a SPEC-NNN reference inside a TODO). SPEC-refs outside a TODO — in live code, identifiers, strings, or structured comments — are caught by convention and code review rather than by a script; they are rare enough that a custom enforcement tool is not worth the maintenance cost.
Memory doctrine: “no SPEC-refs in code” is a default; the narrow warning-state exception for in-flight work (resolved within 1–2 pushes) is what the TD rules actually enforce.
Alternatives considered
ESLint + Prettier (rejected). Broadest ecosystem and maximum rule flexibility. Rejected on speed — the pre-commit hook tier cannot hit its 3s latency target with an ESLint + Prettier run on a monorepo of meaningful size, and the config sprawl across .eslintrc, .prettierrc, package.json, and plugin packages adds maintenance drag.
oxlint (rejected for now). Rust-based like Biome, slightly faster on lint, but as of this decision it does not ship a formatter, so adopting it would require pairing with Prettier and re-introducing the two-tool-two-config problem Biome solves. Revisit if oxlint ships a formatter.
Keep check_spec_refs.py (rejected). The custom script duplicates what ruff TD rules already express, requires ongoing Python maintenance, and encourages agents to treat “has a CI check” as a substitute for “follows the convention.” Ruff TD rules are the right blast radius.
Auto-fix-and-re-stage on pre-commit (rejected). Saves a few seconds per commit when formatting drifts. Rejected because the silent re-stage creates a gap between what the agent (human or otherwise) thought was committed and what actually landed. Explicit failures with “run X to fix” preserve the mental model.
Release-Please / semantic-release (rejected for CHANGELOG). Both tools couple CHANGELOG generation with automated semver bumping and GitHub-release creation. The project does not want tool-driven releases at this stage — git-cliff does CHANGELOG only, runs on a tag the operator has already decided to push, and stays out of the release-decision path.
Consequences
biome.jsonlives at repo root and is the single source for TS lint/format.cliff.tomllives at repo root;release.ymlin.github/workflows/runs git-cliff on tag push and commits the updated CHANGELOG..pre-commit-config.yamldefines all four tiers of hooks (pre-commit / commit-msg / pre-push); CI re-runs the full battery plus integration tests.pyproject.tomlkeepsTD003/TD004/TD005in the selected ruleset; no separate SPEC-ref enforcement script is added.- Scaffolding (SPEC-296) lands these files from commit one. Re-visiting any of these four decisions later requires a follow-up ADR.
- A developer who prefers auto-fixing hooks can run formatters locally; the hook contract says the repo will not fix them on your behalf.