TL;DR
- Contract-first + verification gates eliminate gaps (missing fields, regressions). fileciteturn2file0
- But LLMs still create rule drift (they quote rules, then violate them).
- We fixed drift with guardrails, not more prose: Hard-Fail rule IDs, STOP gates, mandatory evidence audits, and negative examples.
- Entry point stays tiny and stable; the enforcement lives in versioned docs.
- We keep two profiles: Frozen (v1.0) for speed, Strict (v1.1) for enforcement.
Why this follow-up exists
In Zero-Gap API Development, the goal was “ship APIs with zero regressions” by turning implementation into a contract + verification problem: TypeScript interfaces as canonical truth, safe defaults, systematic verification gates, and discrepancy classification ([IMPL] vs [DOC]). fileciteturn2file0
That framework works. The problem we hit next was different:
AI doesn’t drift because rules are unclear. It drifts because rules have no consequences.
Even when an agent reads AGENTS.md and repeats rules back, it can still “helpfully” shortcut:
- “Use Ransack for filtering/searching.” → agent writes custom
where(...)anyway. - “Avoid DB-specific SQL.” → agent sneaks in
ILIKE,IFNULL,REGEXP, etc. - “Controllers only wrap envelopes.” → agent hand-builds JSON in controller.
So this session was about adding an enforcement layer on top of Zero-Gap.
The goal of this session
We wanted a system where a reviewer can say:
“This violates two rules: HF-1 and HF-2”
…and the agent itself is forced to detect and correct that before shipping code.
To do that, we restructured the workflow into a toolchain:
PROMPT.md→ execution runtime (phases, STOP gates, verification receipts)AGENTS.md→ hard authority (Hard-Fail rules, failure semantics)GUIDE.md→ patterns & examples (non-authority, includes anti-patterns)bin/verify→ canonical verifier
The single entry point (stable forever)
Instead of embedding giant prompts in README/issues, every task uses the same tiny entry point:
Task: WEB-485 – User Notifications
Refs:
- Requirements: doc/requirements/users/USER_NOTIFICATIONS.md
- Flow: doc/flow/WEB-485_user_notifications.md
- PRD: doc/prd/WEB-485_user_notifications_PRD.md
Execute per PROMPT.md.
Important property: the task block never changes.
Strictness evolves only inside PROMPT.md, AGENTS.md, and GUIDE.md.
This prevents the most common drift source: copying outdated prompt blobs across tickets.
Frozen vs Strict: why we keep both
We ended up with two prompt profiles:
Frozen (v1.0)
- Optimized for: brevity, human readability, speed
- Best for: trusted contributors, small refactors, quick spikes
- Weakness: rules are descriptive (“MUST”), so AI can rationalize shortcuts
Strict (v1.1)
- Optimized for: compliance, auditability, “no surprises” AI execution
- Best for: contract-sensitive endpoints, anything touching search/filtering/auth/pagination
- Weakness: longer text (but it’s enforcement scaffolding, not noise)
The key is: both share the same entry point. You swap the runtime, not the call site.
What actually stops drift (the 4 mechanisms)
1) Hard-Fail rule IDs (HF-*)
Instead of “Rule #5,” we introduced stable IDs:
- HF-1 — Ransack-only search/filtering
- HF-2 — DB-agnostic queries only (no raw SQL / DB-specific funcs)
- HF-3 — Blueprinter-only JSON
- HF-4 — snake_case only
- HF-5 — Required fields never null (safe defaults)
Why IDs matter: they turn “standards” into enforceable references:
- easy review comments (“HF-2 violation”)
- easy self-audit checkboxes
- easy future evolution (rename content, keep ID stable)
2) STOP gates (control flow, not suggestions)
Strict mode adds explicit control flow:
- Phase 0–2: NO CODE
- Stop after planning
- If any HF-* would be violated → STOP and report
LLMs follow control flow better than prose.
3) Mandatory Rule Compliance Audit (proof, not vibes)
Before final output, the agent must produce:
RULE COMPLIANCE AUDIT
HF-1: COMPLIANT — evidence: UsersController#index uses User.ransack(params[:q])
HF-2: COMPLIANT — evidence: no raw SQL; only ActiveRecord/Arel
...
This forces the agent to prove it complied. If it can’t produce evidence, it typically self-corrects before shipping.
4) Negative examples (anti-pattern firewall)
Guides that only show “good” patterns still allow AI to invent “bad” ones. So we added explicit “DO NOT COPY” snippets:
# ❌ HF-1 violation (manual SQL filtering)
User.where("email ILIKE ?", "%#{params[:email]}%")
# ✅ Correct (Ransack owns filtering)
User.ransack(params[:q]).result
LLMs imitate examples aggressively. Negative examples prevent “helpful improvisation.”
Example: the exact drift we’re preventing
Bad (HF-1 + HF-2 violation)
# ❌ don't do this
users = User.where("email ILIKE ?", "%#{params[:email]}%")
Good (HF-1 compliant)
search = User.ransack(params[:q])
users = search.result.page(params[:page]).per(params[:per_page] || 20)
In strict mode, if an agent proposes the bad version, it must output:
❌ RULE VIOLATION
Rule: HF-1
Location: app/controllers/...:12
Reason: manual filtering used instead of Ransack
Required Fix: replace with Model.ransack(params[:q]).result
No debate. Fix it or stop.
Verification: one command, one receipt
Zero-Gap already required verification gates. fileciteturn2file0
The strict upgrade makes verification harder to “forget” by standardizing:
- Preferred:
bin/verify - Legacy commands remain in docs as commented fallback
And the final output requires a CHECKS section with exit codes.
That turns “I think I ran tests” into a receipt.
Operational guidance: when to use which mode
Use Frozen (v1.0) when:
- small refactor
- low contract risk
- you want short prompts and fast iteration
Use Strict (v1.1) when:
- building/upgrading endpoints
- touching filtering/search/pagination/auth
- upgrading versioned requirement docs
- AI is doing most of the work
If in doubt: default to Strict.
Closing: Zero-Gap is the architecture; Zero-Drift is governance
Zero-Gap prevents missing fields and regressions by making contracts and verification non-negotiable. fileciteturn2file0
Zero-Drift makes AI reliably follow your engineering rules by adding:
- failure semantics
- proof requirements
- anti-pattern training
- a stable entry point
- a canonical verifier
In practice, this reduces review churn dramatically: you spend less time catching “obvious violations” and more time on product decisions.
Appendix: quick mental model
REQUIREMENT_DOC= specPROMPT.md= runtimeAGENTS.md= language law + compiler errors (HF-*)GUIDE.md= standard library + examplesbin/verify= test runner