“The hard part is no longer getting AI to generate code. The hard part is getting AI to generate code that belongs in your system.”
TL;DR
AI coding tools are getting better, but most teams still lose time because generated code is only almost right.
The useful target is not:
AI writes code.
The useful target is:
AI writes code that is mergeable by default.
That requires a context engine: a reasoning layer that knows your codebase, conventions, previous decisions, permissions, docs, tests, ownership, and current task — then gives the agent only the context it actually needs.
This is the practical point from the talk:
- bigger context windows are not enough
- naive RAG is not enough
- random MCP servers are not enough
- code search alone is not enough
- AI without context creates review debt
- AI with the right context can reduce token cost, review cycles, and rework
For my own Rails / API / AI-assisted workflow, this maps almost directly to:
AGENTS.mdPROMPT.mdGUIDE.mddoc/requirements/**- Flow docs
- PRDs
bin/verifybin/contract_audit- RSpec / rswag / Pundit / Blueprinter conventions
In other words: the best AI coding system is not just a model.
It is an engineering operating system around the model.
The YouTube talk worth knowing
The video is:
Mergeable by default: Building the context engine to save time and tokens — Peter Werry, Unblocked
The core description is simple:
Agents can generate code. The hard part is generating code that’s right for your system, team conventions, and past decisions.
That line is the whole problem.
Modern coding agents can produce code fast. But fast code is not automatically useful code. If the agent does not know your real system, it creates a second job for the team:
- pay for generation in tokens
- pay again in review, correction, and cleanup
That is why the talk focuses on a context engine: the layer that decides what the agent should know before it acts.
Why this matters
Most AI coding demos optimize for the wrong thing.
They show:
- a prompt
- a generated file
- a green-looking result
- a fast demo loop
But production engineering is not a demo.
Production engineering has:
- old decisions
- team conventions
- naming rules
- API contracts
- permission boundaries
- security requirements
- test expectations
- serializer rules
- migration history
- review standards
- product requirements
- half-finished work
- legacy constraints
If AI ignores those, the generated code may look good and still be wrong.
That is the dangerous part: AI often fails in ways that are plausible.
It creates code that looks like something a developer would write, but it may not match the system.
The real AI coding problem
The problem is not:
Can the model write Ruby, Go, TypeScript, or Python?
It can.
The problem is:
Does the model know what correct means inside this repo?
Correct is not only syntax.
Correct means:
- the endpoint follows the existing API shape
- the response uses the right envelope
- the serializer follows the project convention
- authorization is enforced in the expected way
- search uses the expected search library
- pagination follows the standard project pattern
- tests cover the contract
- docs are updated in the right place
- generated OpenAPI is produced from specs, not edited by hand
- no new representational drift is introduced
This is where AI coding becomes an engineering systems problem.
Bigger context windows do not solve this alone
It is tempting to think the solution is:
Just give the model more context.
But bigger context is not the same as better context.
A huge context window can still contain:
- irrelevant files
- outdated docs
- conflicting instructions
- stale decisions
- too many examples
- private information the agent should not see
- similar-but-wrong patterns
- code that should not be copied
More context can even make the output worse if the agent cannot tell which source has authority.
The real need is not maximum context.
The real need is selected, ranked, permission-aware, task-specific context.
That is what a context engine should do.
Naive RAG is also not enough
Basic RAG usually means:
- embed documents
- search for similar chunks
- paste the top results into the prompt
That helps, but it is not enough for serious engineering work.
Why?
Because code work often needs reasoning across sources:
- requirement says one thing
- Flow doc says another
- current controller does a third thing
- specs define the real contract
- old PR explains why the strange behavior exists
- security policy limits what can be touched
Simple similarity search does not know which source should win.
A useful context engine needs to answer:
- which document is authoritative?
- which code path is actually active?
- which convention is current?
- which files changed recently?
- who owns this area?
- what should be excluded?
- what is relevant to this exact task?
That is different from “search and paste.”
MCP servers are useful, but not magic
MCP-style tool access can connect agents to more systems:
- GitHub
- Linear / Jira
- docs
- Slack
- databases
- observability
- local tools
That is useful.
But tool access is not the same as understanding.
An agent with 20 tools can still fail if it does not know:
- when to use which tool
- which result is authoritative
- what the task boundary is
- what the team convention expects
- what must not be changed
- what proof is required before completion
Tools give the agent hands.
A context engine gives the agent judgment boundaries.
What a context engine should contain
A useful engineering context engine should collect and reason over several layers.
| Layer | What it answers |
|---|---|
| Code context | What files, services, models, controllers, serializers, tests, and configs matter? |
| Product context | What requirement, PRD, or ticket defines the behavior? |
| Architecture context | What patterns does this repo already use? |
| Convention context | What naming, response, route, and test rules must be followed? |
| Decision context | Why was this built this way before? |
| Permission context | What is the agent allowed to read or change? |
| Ownership context | Who owns the area, or what review path matters? |
| Verification context | What command proves the change is safe? |
| Drift context | What kinds of changes are forbidden even if tests pass? |
The important word is context, not documents.
The engine is not just a folder of markdown files.
It is the system that decides what matters for the current task.
What “mergeable by default” really means
“Mergeable by default” does not mean AI code is automatically merged.
It means the AI output is shaped so that the default review path is:
verify, review, merge
Not:
rewrite, argue with the model, fix conventions, add missing tests, update docs, clean up payload shape, then maybe merge
A mergeable-by-default change has these properties:
| Property | Meaning |
|---|---|
| Contract-aware | It implements the actual requirement, not a guessed version |
| Convention-aware | It follows existing repo patterns |
| Test-backed | It includes relevant tests/specs |
| Permission-safe | It does not cross access boundaries |
| Drift-safe | It does not invent new response shapes or naming styles |
| Minimal | It touches only what the task requires |
| Explainable | It can point to why files changed |
| Verifiable | One command can prove the change is safe enough to review |
This is the difference between using AI as autocomplete and using AI as a controlled engineering assistant.
The review-cost problem
Bad AI code is expensive because it creates hidden work.
The obvious cost is token usage.
The bigger cost is review drag:
- reviewers must detect subtle mismatch
- developers must check every convention manually
- specs may be missing or shallow
- docs may not reflect behavior
- generated code may duplicate existing logic
- authorization may be incomplete
- migration or serializer decisions may drift
This is why the talk’s “save time and tokens” framing matters.
Tokens are not the main cost.
The real cost is the human correction loop after bad context.
How this maps to my Rails workflow
This topic is especially relevant to my own Rails / API workflow because I already treat AI as part of a controlled delivery system.
In my setup, the closest equivalent of a context engine is not one tool. It is the combination of:
| Piece | Role |
|---|---|
AGENTS.md |
Operating rules and delivery discipline |
PROMPT.md |
Task execution flow |
GUIDE.md |
Examples and reusable patterns |
doc/requirements/** |
Source of truth for behavior |
| Flow docs | Implementation traceability |
| PRDs | Product context |
| RSpec | Behavior proof |
| rswag | API contract proof |
| Pundit policies | Authorization proof |
| Blueprinter | Response-shape discipline |
| Ransack + Kaminari | Search and pagination convention |
bin/verify |
Fast verification loop |
bin/contract_audit |
Drift prevention |
This is why the talk feels immediately practical.
It says: do not just buy a smarter model.
Build the environment where the model can make fewer wrong assumptions.
That matches my own conclusion from AI-assisted backend work:
AI productivity comes from externalized judgment, not prompt luck.
A concrete Rails example
Imagine a task:
Add a new nested API endpoint for profile awards with file attachments.
A generic AI agent may create:
- a controller
- a model
- a serializer
- a few routes
- maybe a spec
But a context-aware agent should know much more:
- endpoints must live under
/api/v1/profiles - JSON must stay snake_case
- list responses must return
{ data, meta } - uploads should use nested attributes in one request
- Blueprinter should serialize file URLs consistently
- Pundit authorization must be represented
- request specs must include multipart create/update tests
- Ransack and Kaminari are expected for searchable/paginated endpoints
ransackable_attributesmust be explicit- Flow docs must follow the existing template
- rswag specs should drive OpenAPI generation
bin/verifyand contract audit must pass
That is not “more code.”
That is project-specific correctness.
And this is exactly the kind of correctness that generic AI does not infer reliably unless the context system teaches it.
The context engine as a compiler for engineering judgment
A good way to think about this:
| Traditional compiler | AI engineering system |
|---|---|
| Source code | Requirements + task |
| Language spec | AGENTS.md / conventions |
| Type checker | Tests + contracts |
| Linter | Style / naming / route rules |
| Static analyzer | Security + drift audit |
| Build output | Mergeable PR |
The model generates the draft.
The context engine controls the environment around the draft.
That is the serious version of AI-assisted development.
Important lessons from the talk
1. Context must be task-specific
Do not dump the whole repo into the model.
Give it the files, docs, tests, decisions, and examples that matter for the task.
2. Authority matters
The agent must know which source wins when sources conflict.
For example:
- current requirement
- current tests/specs
- current implementation
- older docs
- old examples
Without authority ranking, the agent can follow stale context confidently.
3. Permissions matter
A production context engine must respect access boundaries.
Not every agent should see every Slack thread, customer doc, security detail, or private dataset.
4. Personalization matters
Two engineers may ask the same question but need different context.
A backend engineer, frontend engineer, new hire, tech lead, and reviewer should not necessarily receive the same context bundle.
5. Review cycles are the hidden cost
The output should be optimized for fewer review corrections, not just faster generation.
6. The best context system learns from real mistakes
A good context engine improves when the team discovers:
- wrong files were retrieved
- stale docs were used
- permissions were too broad
- generated code missed a convention
- tests passed but contract drift appeared
That feedback should become system behavior, not tribal memory.
What is worth mentioning from the YouTube link
If I had to reduce the video into a few points worth remembering:
-
AI codegen is already capable enough to be dangerous.
The remaining problem is whether it writes code that fits the system. -
Wrong context creates double cost.
You pay once for generation and again during review/fix cycles. -
Bigger context windows are not a strategy.
They help, but they do not solve authority, relevance, conflict, permissions, or freshness. -
Naive RAG is not the final answer.
Search results need reasoning and ranking, especially across code, docs, history, and decisions. -
MCP/tool access is only plumbing.
Tools expose information; they do not decide what matters. -
The real product is the context engine.
It is the reasoning layer between the agent and the organization. -
“Mergeable by default” is the practical success metric.
Not “did it generate code?”, but “did it reduce the path to a safe merge?”
What teams usually get wrong
They treat prompts as architecture
A prompt is not enough.
If the repo does not have clear requirements, tests, conventions, and verification gates, the prompt becomes a wish list.
They treat docs as optional
For AI-assisted work, docs are not bureaucracy.
Docs are training material for the engineering system.
Bad docs create bad AI behavior.
They optimize for demo speed
Demo speed is not production speed.
Production speed includes review, verification, rollback risk, onboarding, maintenance, and future changes.
They let AI invent conventions
This is one of the fastest ways to create long-term codebase drift.
AI should follow conventions, not create a new local style every task.
They do not separate generation from verification
Generation is cheap.
Trust is expensive.
The system must make trust cheaper.
What I would build next
For my own workflow, the next practical step is to make the context engine more explicit.
Today, much of it already exists as files and scripts.
The next version should behave more like a retrieval + reasoning layer.
1. Task classifier
Given a task like WEB-142 Awards, classify:
- endpoint work
- model work
- attachment work
- docs work
- authorization work
- specs work
- swagger work
Then load relevant rules automatically.
2. Context bundle generator
Produce a task-specific context bundle:
Task: WEB-142 Awards
Authoritative sources:
- doc/requirements/WEB-142_*.md
- related Flow doc
- related PRD if present
Relevant implementation:
- app/controllers/api/v1/profiles/...
- app/models/award.rb
- app/blueprints/...
- spec/requests/...
Rules:
- snake_case only
- Ransack + Kaminari
- Blueprinter envelope
- Pundit authorization
- nested attachments in one request
- multipart request specs
3. Drift checker expansion
Extend bin/contract_audit to catch more AI-specific drift:
- new response shapes outside blueprints
- missing
ransackable_attributes - missing multipart specs for attachable models
- missing docs section updates
- inconsistent endpoint naming
- unauthorized direct JSON rendering
4. Retrieval audit
Every AI-generated change should be able to answer:
- what context did it use?
- which sources were authoritative?
- which files were intentionally ignored?
- what proof command passed?
- what contract did it implement?
This would make AI output much easier to trust.
Software fundamentals matter more than ever in the AI era
Another strong talk that connects directly to the “mergeable by default” idea is:
“Software Fundamentals Matter More Than Ever”
The message is extremely important because it explains why many AI-assisted codebases start degrading even when teams use powerful models.
The core point is simple:
AI amplifies the quality of the engineering system around it.
That amplification works in both directions.
| Existing engineering quality | What AI does |
|---|---|
| Strong architecture + conventions | Accelerates delivery |
| Weak architecture + drift | Accelerates chaos |
This is why AI adoption often produces very different outcomes between teams using similar tools.
One team gets:
- faster delivery
- cleaner onboarding
- fewer repetitive tasks
- safer iteration
Another team gets:
- more review debt
- duplicated abstractions
- inconsistent APIs
- growing architectural entropy
The difference is rarely the model itself.
The difference is usually:
- system boundaries
- naming consistency
- decomposition quality
- verification
- contracts
- maintainability discipline
AI amplifies entropy if the codebase is already chaotic
One of the strongest ideas from the second talk is:
AI makes bad codebases worse faster.
That sounds obvious, but it has major implications.
Before AI:
- weak engineering practices slowed teams down
- messy systems evolved more slowly
- architectural damage accumulated gradually
Now:
- AI can generate large amounts of plausible code very quickly
- bad patterns replicate faster
- local shortcuts spread across the repo
- inconsistent abstractions become normalized
The dangerous part is that the output often looks reasonable.
That creates a false sense of progress.
A weak engineering system with AI can produce:
- more code
- more PRs
- more surface area
while simultaneously reducing long-term maintainability.
This is why software fundamentals suddenly matter more, not less.
Why Rails works unusually well with AI
This also explains why Rails often performs surprisingly well in AI-assisted development compared to fragmented stacks.
Rails strongly reduces ambiguity through conventions:
| Concern | Rails convention |
|---|---|
| Models | app/models/** |
| Controllers | app/controllers/** |
| Request specs | spec/requests/** |
| Background jobs | app/jobs/** |
| Serialization | predictable serializer/blueprint layer |
| Naming | convention over configuration |
| Routing | consistent REST structure |
LLMs benefit heavily from predictability.
The less time the model spends asking:
“How is this project organized?”
the more time it spends solving the actual task.
This is one reason why convention-heavy systems become powerful AI multipliers.
Decomposition becomes critical
Another important point from the second talk:
LLMs perform much better on small, deterministic tasks than vague, giant problems.
That has major implications for engineering workflows.
Good AI-assisted systems should encourage:
- bounded contexts
- small implementation scopes
- explicit requirements
- narrow responsibilities
- isolated verification loops
This maps directly to:
- feature decomposition
- Flow docs
- PRDs
- contract-first APIs
- verification-driven development
Without decomposition, AI receives:
- too much ambiguity
- too many unrelated files
- unclear authority
- mixed responsibilities
That increases hallucination and drift.
With decomposition:
- retrieval becomes cleaner
- verification becomes cheaper
- generated code becomes more deterministic
- review becomes faster
This is another reason why a context engine matters: it reduces entropy before generation even starts.
Verification matters more than generation
One of the biggest misconceptions in AI engineering is:
faster generation = better engineering
In reality:
verification quality matters more than generation speed.
AI is extremely good at producing:
- plausible code
- plausible architecture
- plausible explanations
But plausible is not the same as correct.
That is why systems like:
bin/verify- contract audits
- OpenAPI contracts
- request specs
- drift prevention
- architectural rules
become increasingly important in the AI era.
The stronger the generation becomes, the more important proof becomes.
Or more simply:
verification > generation
Senior engineers become more important, not less
The second talk also indirectly explains why strong senior engineers become more valuable in AI-assisted environments.
Junior engineers with AI can generate large amounts of code quickly.
But senior engineers contribute something different:
- architectural judgment
- decomposition
- boundary design
- tradeoff analysis
- drift detection
- maintainability intuition
- verification discipline
AI accelerates implementation.
Senior engineers define:
- what should exist
- what should not exist
- what should remain stable
- what correctness means
This is why the real leverage comes from:
- externalized engineering judgment
- codified conventions
- reusable verification systems
- structured context
Not from prompt tricks.
The combined lesson from both talks
The two talks together point toward the same conclusion.
The future of AI-assisted engineering is not:
- bigger prompts
- more vibe coding
- infinite context dumping
- raw generation speed
The future is:
- explicit context systems
- deterministic workflows
- engineering conventions
- retrieval quality
- decomposition
- verification
- drift prevention
- mergeability
The model matters.
But the surrounding engineering operating system matters even more.
The final takeaway
The future of AI coding is not just smarter code generation.
It is better engineering context.
The best teams will not only ask:
Which model should we use?
They will ask:
What system makes the model correct inside our codebase?
That system will include:
- requirements
- conventions
- code search
- decision history
- permission control
- task-specific retrieval
- verification
- drift prevention
- review feedback loops
That is the real lesson from “Mergeable by default.”
AI does not become valuable because it writes more code.
AI becomes valuable when the surrounding engineering system makes its output safe enough, relevant enough, and consistent enough to merge.
For me, that means the next stage of AI-assisted Rails development is clear:
turn the existing verification-driven workflow into an explicit context engine.
Not more prompt magic.
Not bigger context for the sake of bigger context.
Not vibe-coded PRs.
A system that gives the agent the right context, blocks the wrong drift, proves the result, and makes the final state boring:
all good — merge PR.