Decisions

ADR-012: Dev tooling — Biome, git-cliff, tiered commit hooks, ruff TD rules

Status: Accepted (2026-04-20; mypy portion superseded by ADR-051 — ty is the primary Python type checker; ruff ANN family backfills missing-annotation coverage; mypy retained as informational warning in tools/dev/precheck.sh. Biome, git-cliff, tiered commit hooks, and ruff TD rules stand)

Context

The 0.3.0 rewrite is greenfield on the new repo (~/Source/OMG/spectral/). Before scaffolding lands (SPEC-296), the dev-tooling stack needs to be decided: TypeScript lint/format, release-notes automation, commit-hook strategy, and the fate of the custom check_spec_refs.py script from the legacy repo.

Four related questions sit together because they all shape what the new repo’s scaffolding ships with on day one:

TypeScript lint + format toolchain
CHANGELOG automation
What runs at which commit-hook tier
How to keep SPEC-refs out of code (prior enforcement was a custom Python script)

Decisions captured below; alternatives considered after each.

Decision

1. Biome for TypeScript lint + format

Use Biome for TypeScript/JavaScript lint and format across the monorepo. Do not use ESLint + Prettier.

Rationale:

Speed. Biome is Rust-based and runs roughly 10–100× faster than ESLint + Prettier on the same tree. Tooling speed matters because the other hooks on this list compound; a slow formatter drags the whole pre-commit tier below its target.
Single tool. Lint and format in one binary with one config file (biome.json). Fewer toolchain pieces, fewer version-drift failure modes, less config scattered across package.json, .eslintrc, .prettierrc.
Fits the “fast tooling at key gates” operating principle. The dev loop depends on hook latency; Biome is picked for that constraint first, ecosystem breadth second.

Known trade-off: Biome’s plugin ecosystem is smaller than ESLint’s. For a solo-controlled codebase where style rules are decided centrally and applied uniformly, the ecosystem gap is acceptable. If the project later adopts a plugin that only exists in ESLint, revisit via a follow-up ADR.

2. git-cliff for CHANGELOG.md

Use git-cliff to generate CHANGELOG.md from Conventional Commits. Run it in CI on tag pushes via GitHub Actions (.github/workflows/release.yml). Config in cliff.toml at repo root.

This is distinct from WorldModelCard release notes (SPEC-243). CHANGELOG describes engineering / codebase changes for developers; WorldModelCard describes world-model rule changes for operators. Different artifacts, different audiences, different cadences — do not conflate them.

3. Tiered commit-hook strategy

Four tiers, each with a latency target. A tier only contains checks it can finish within its budget; anything slower gets pushed down to the next tier.

Tier	Latency target	Contents
Pre-commit	< 3s	`biome format` / `biome check`, `ruff format` / `ruff check`, hygiene hooks (trailing whitespace, EOF fixer, large-file detection, private-key detection)
Commit-msg	negligible	commitizen — enforces Conventional Commits message format
Pre-push	< 30s	`ty check`, `tsc --noEmit`, `tools/quality/validate_architecture.py`, fast unit tests
CI	best-effort	everything above + integration tests + coverage thresholds + the full test matrix

Hook behavior is fail-fast rather than auto-fix-and-stage. When a formatter would change files, the hook errors with a clear “run X to fix” message rather than silently re-staging the tree. Rationale: explicit beats magic. Auto-staging hides diffs from commit-review and creates agent-visible surprises where the committed tree differs from what the agent thought it was committing. The few seconds saved aren’t worth the confusion.

Type-checker portion superseded by ADR-051. The original disposition selected mypy --strict for the pre-push tier; ADR-051 replaced mypy with ty (Astral’s type checker) as the primary, with ruff ANN family backfilling missing-annotation coverage; mypy is retained as an informational warning in tools/dev/precheck.sh. Tier latency target + fail-fast behavior stand.

4. Drop `check_spec_refs.py`; rely on ruff TD rules

The legacy repo ships a custom tools/quality/check_spec_refs.py script that blocks SPEC-refs in code. Do not port it. Instead, rely on ruff’s TD rules already enforced via pyproject.toml:

TD003 — missing issue link in TODO
TD004 — missing author in TODO
TD005 — missing link in TODO

These rules mean any TODO comment must link to a tracked issue (e.g. a SPEC-NNN reference inside a TODO). SPEC-refs outside a TODO — in live code, identifiers, strings, or structured comments — are caught by convention and code review rather than by a script; they are rare enough that a custom enforcement tool is not worth the maintenance cost.

Memory doctrine: “no SPEC-refs in code” is a default; the narrow warning-state exception for in-flight work (resolved within 1–2 pushes) is what the TD rules actually enforce.

Alternatives considered

ESLint + Prettier (rejected). Broadest ecosystem and maximum rule flexibility. Rejected on speed — the pre-commit hook tier cannot hit its 3s latency target with an ESLint + Prettier run on a monorepo of meaningful size, and the config sprawl across .eslintrc, .prettierrc, package.json, and plugin packages adds maintenance drag.

oxlint (rejected for now). Rust-based like Biome, slightly faster on lint, but as of this decision it does not ship a formatter, so adopting it would require pairing with Prettier and re-introducing the two-tool-two-config problem Biome solves. Revisit if oxlint ships a formatter.

Keep check_spec_refs.py (rejected). The custom script duplicates what ruff TD rules already express, requires ongoing Python maintenance, and encourages agents to treat “has a CI check” as a substitute for “follows the convention.” Ruff TD rules are the right blast radius.

Auto-fix-and-re-stage on pre-commit (rejected). Saves a few seconds per commit when formatting drifts. Rejected because the silent re-stage creates a gap between what the agent (human or otherwise) thought was committed and what actually landed. Explicit failures with “run X to fix” preserve the mental model.

Release-Please / semantic-release (rejected for CHANGELOG). Both tools couple CHANGELOG generation with automated semver bumping and GitHub-release creation. The project does not want tool-driven releases at this stage — git-cliff does CHANGELOG only, runs on a tag the operator has already decided to push, and stays out of the release-decision path.

Consequences

biome.json lives at repo root and is the single source for TS lint/format.
cliff.toml lives at repo root; release.yml in .github/workflows/ runs git-cliff on tag push and commits the updated CHANGELOG.
.pre-commit-config.yaml defines all four tiers of hooks (pre-commit / commit-msg / pre-push); CI re-runs the full battery plus integration tests.
pyproject.toml keeps TD003 / TD004 / TD005 in the selected ruleset; no separate SPEC-ref enforcement script is added.
Scaffolding (SPEC-296) lands these files from commit one. Re-visiting any of these four decisions later requires a follow-up ADR.
A developer who prefers auto-fixing hooks can run formatters locally; the hook contract says the repo will not fix them on your behalf.

Previous
ADR-008: Migrate Scan Pipeline LLM Provider to Pydantic AI Next
ADR-014: EvaluationFramework as shared contractual type — customer-directed parameterization