Decisions

ADR-084: Multi-tenant noisy-neighbor handling — rate limiting and per-tenant observability

Context

api pod replicas host decision modules for many customers concurrently (per ADR-076 D1 as clarified in Session 3d.2). Without per-tenant constraints, a single high-volume tenant could saturate the fleet’s request capacity, degrading other tenants’ latency and availability. Noisy-neighbor handling at runtime — per-tenant rate limits, CPU-time budgets, and an optional dedicated-pod escape hatch — is the concern this ADR settles.

The v0 cost profile of a decision call (per Codex content captured for Phase 4 — system-design/foundations/decision-execution.mdx) makes the noisy-neighbor problem tractable:

Authorization + request validation — light, typical web/API.
Action mapping + module load — light steady-state; cold-start variance bounded to first load per (org, domain, action, version) per pod.
Context establishment — light to medium; rises with computed-attribute derivations and lookup-based context.
Predicate execution per rule — lightest phase; pure stateless deterministic code with no I/O; per-predicate timeout from ADR-083 Layer 2 caps the per-call worst case.
Aggregation + response prep — light.

Sum of per-call cost is bounded across all phases. A tenant’s only lever to consume disproportionate resources is call volume. Rate limiting bounds call volume. Aggregate per-tenant CPU budgets and dedicated-pod escape hatches address scenarios the v0 cost profile doesn’t generate.

Decision

D1 — Multi-dimensional rate limiting at the API surface

Rate limits enforce two dimensions, hierarchically:

Per-org (top level) — a token bucket scoped to org_id. Aggregate request rate across all of an org’s API keys and all (org, domain) deployments. The org-level bucket is the customer-visible “total budget” surface.
Per-API-key (second level) — a token bucket scoped to each API key. API keys are minted against an (org, domain) pair and are contextually limited to calling decisions within that domain (the per-key bucket applies on top of the per-org aggregate).

A request must clear both buckets to proceed. The per-key bucket prevents one key’s traffic from monopolizing the org’s total budget; the per-org bucket caps the aggregate.

Per-action rate limits are an optional third dimension available when domain operators need to tune per-action capacity (e.g., a high-volume wire_transfer.release action may carry different limits than a low-volume vendor.onboard). Per-action limits are operational tuning, not architecturally committed shape — they ride on the same enforcement mechanism if and when activated.

Rate-limit responses follow standard HTTP convention (429 + Retry-After header + RFC 9457 Problem Details per ADR-006 D3). Bucket sizes, refill rates, and burst capacities are operational tuning, not ADR-level commitments.

D2 — Per-tenant operational observability

Per-tenant observability dimensions ride the existing observability substrate (ADR-036). The required dimensions:

Call rate per (org_id), per (org_id, domain), and per (org_id, api_key_id) over time windows.
Latency distribution (p50 / p95 / p99) per the same dimensions.
Status distribution (GREEN / GREEN-SKIP / YELLOW / RED) per the same dimensions.
Errored-predicate rate and timeout-exceedance rate per the same dimensions.
Rate-limit-rejection rate (429s) per the same dimensions — operational signal for tenants approaching their limits.

The customer-facing view of these dimensions lands on the System Card (per ADR-082 D3) at (org, domain) granularity. The operator-facing view exposes cross-tenant aggregation for noisy-tenant identification, capacity planning, and tier-tuning decisions. The cost-observation surface (when commercial billing lands) uses the same (org, domain) segmentation as rate limiting, ensuring customers see the same unit for limits, decisions, and costs.

D3 — Per-call CPU bound

Per-call CPU is bounded by the predicate-timeout mechanism established in ADR-083 Layer 2. This ADR introduces no new per-call CPU mechanism; the L2 timeout is the canonical bound, and timeout-exceedance is captured in decision metadata and the observability dimensions in D2.

D4 — No aggregate per-tenant CPU budget at v0

An aggregate per-tenant CPU budget — capping total CPU/time a tenant consumes across all their requests — is not implemented at v0. The v0 cost profile (per the phase analysis in Context) does not generate an unbounded-per-call attack surface, so the only multi-tenant lever is call volume (D1). Introducing per-tenant CPU accounting, eviction policies, and fairness algorithms would be over-engineering for a scenario v0 does not produce.

If operator telemetry or specific customer profiles later surface an aggregate-CPU concern that rate limiting cannot address, the concern is revisited with a follow-on ADR. The v0 omission is deliberate, not deferred.

D5 — No dedicated-pod escape hatch in v0 architecture

The dedicated-pod escape hatch — giving strict-tenancy customers their own decision-server pods — is not in v0 architecture. The v0 cost profile + multi-dimensional rate limiting + per-tenant observability covers the realistic noisy-neighbor cases without requiring tenant-level deployment isolation.

The escape hatch was sketched as a possible mitigation alongside ADR-083 D6’s OS-level-sandbox reservation; this ADR retires that reservation. If a specific customer requirement (regulated industry, contractual isolation, compliance audit) later forces tenant-level deployment isolation, that is a follow-on architectural decision against concrete requirements rather than a v0 reservation we hold open without driver.

Alternatives considered

Single-dimensional per-API-key rate limiting (no per-org aggregate). Rejected. A customer with many API keys could exceed their intended total budget by spreading traffic across keys. Per-org aggregate is the customer-visible “total” surface; per-key is the fairness control under that aggregate.

Per-IP rate limiting in addition to per-org / per-key. Rejected for v0. Per-IP is an abuse-defense pattern; customer auth (per ADR-039) is the trusted identity surface for normal traffic. Per-IP can layer on at a CDN/edge layer (per ADR-052 Cloudflare) if abuse becomes a concern; not architecturally committed here.

Per-tenant CPU budgets at v0. Rejected per D4 — the cost profile doesn’t justify the accounting machinery.

Dedicated-pod escape hatch at v0 (or reserved). Rejected per D5 — over-engineering against scenarios v0 does not produce; explicit deferral rather than reserved option to avoid “we held this open without a driver” drift.

Adaptive rate limits that respond to system load (rather than static buckets). Rejected for v0. Adaptive systems add control-loop complexity (load measurement, feedback timing, oscillation prevention). Static buckets per D1 are the standard PaaS pattern and operationally legible. Adaptive limits can layer on later if needed.

Defer all noisy-neighbor handling to operational tuning post-launch. Rejected. Rate limiting at the API surface is a v0 launch requirement (without it, a single chatty tenant can degrade the fleet); per-tenant observability is required to operate the system. Both are architectural commitments, not optional tuning.

Consequences

The API layer at api enforces D1 rate limits in front of /api/decide and the MCP equivalent. Bucket configuration (rates, burst, refill) is operational tuning; the architectural commitment is that the two-dimensional enforcement exists.
API key minting (per ADR-039 + ADR-006 sections 4–5) generates keys against an (org, domain) pair; the API key model’s specifics relative to the scope model are settled by ADR-086 (the org_id / domain_id tenancy hierarchy).
Per-tenant observability dimensions extend ADR-036’s existing observability substrate; the dimension set is a Phase 4 implementation epic concern.
System Card surfaces (per ADR-082 D3) include per-(org, domain) operational metrics; the customer-visible view inherits the same segmentation as rate limiting.
Cost-observation surfaces (when commercial billing lands per ADR-076 D5) use the same (org, domain) segmentation; customers see one unit for limits, decisions, and costs.
The retirement of the dedicated-pod escape hatch from architectural reservations (D5) tightens ADR-083 D6 — that addendum continues to stand but should be read with this ADR’s explicit posture that the escape hatch is not reserved infrastructure.
429 + Retry-After is the rate-limit response shape; integrates with the RFC 9457 Problem Details envelope per ADR-006 D3.
Phase 4 Codex work polishes the new system-design/foundations/decision-execution.mdx page (landed here as a Phase 3d-captured draft) alongside the broader T2 rewrites; the architectural commitments (the five-phase decomposition, cost-profile analysis underpinning D1–D5) are settled.

Previous
ADR-083: Decision-module execution sandbox — layered defense, language-agnostic Next
ADR-085: Module store consistency — content-addressed storage with event-driven projection