Advanced Context Engineering for Coding Agents

Alex Metelli — Sun, 28 Dec 2025 06:13:00 +0000

Full reference:

📺 Advanced Context Engineering for Coding Agents

🎤 Dex Horthy

🔗 https://youtu.be/rmvDxxNubIg?si=GtPAqK-lnY58dlIO

Introduction

AI coding agents have dramatically increased developer throughput. However, in real-world usage—especially in large, long-lived (“brownfield”) codebases—many teams observe a mismatch between output and progress.

This post is a faithful technical distillation of Dex Horthy’s talk on advanced context engineering: practical techniques for making today’s LLMs effective, reliable, and scalable for serious software engineering.

The Problem: Productivity ≠ Progress

Large-scale surveys of developers show a consistent pattern:

AI increases code shipped
Code churn increases even more
Teams repeatedly rework AI-generated output
Brownfield codebases suffer the worst outcomes

AI performs well for greenfield projects, prototypes, and dashboards. But in complex systems with legacy constraints, naive agent usage becomes a tech-debt factory.

This aligns with the lived experience of many senior engineers and founders.

Why This Happens: Context Is the Only Control Surface

Large language models are:

Stateless (no memory between sessions)
Non-deterministic
Entirely governed by the current context window

Every decision—tool usage, file edits, hallucinations—is determined by the tokens currently in context.

Better tokens in → better tokens out.

More tokens does not mean better outcomes.

The Dumb Zone

As context usage grows, model quality degrades. Empirically, this often begins around ~40% of the context window, depending on task complexity.

This region is referred to as the dumb zone.

Common causes

Large tool outputs (JSON, UUIDs, logs)
Unfiltered file dumps
Repeated correction loops
MCPs dumping irrelevant data
Long chat histories full of noise

Once in the dumb zone, agents become unreliable regardless of model quality.

Trajectory Matters

LLMs learn patterns within a conversation.

If the conversation looks like:

Model makes a mistake
Human scolds the model
Model makes another mistake
Human scolds again

The most likely continuation is… another mistake.

Bad trajectories reinforce failure modes.

This is why restarting sessions or compressing context is often more effective than continued correction.

Intentional Compaction

Intentional compaction is the deliberate compression of context into a minimal, high-signal representation.

Instead of dragging an ever-growing conversation forward, you:

Summarize the current state into a markdown artifact
Review and validate it as a human
Start a fresh context seeded with that artifact

What to compact

Relevant files and line ranges
Verified architectural behavior
Decisions already made
Explicit constraints and non-goals

What not to compact

Raw logs
Tool traces
Full file contents
Repetitive error explanations

Compaction converts exploration into a one-time cost instead of a recurring tax.

Sub-Agents Are About Context, Not Roles

Sub-agents are frequently misunderstood.

They are not about mirroring human roles like “frontend agent” or “QA agent”.

They exist to:

Fork a clean context window
Perform large exploratory reads
Return a succinct factual summary to a parent agent

Example:

Sub-agent scans a large repo
Returns: “Relevant logic is in foo/bar.ts:120–340, entrypoint is BazHandler”

The parent agent then reads only what matters.

This is how you scale context without entering the dumb zone.

The Research–Plan–Implement Workflow

This workflow is not “spec-driven development”. That term has become semantically diffused.

RPI is about systematic compaction at every stage.

Research: Compressing Truth

Goal:

Understand how the system actually works
Identify authoritative files and flows
Eliminate assumptions

Characteristics:

Read code, not docs
Produce a short research artifact
Validate findings manually

If agents are not onboarded with accurate context, they will fabricate.

This mirrors Memento: without memory, agents invent narratives.

Plan: Compressing Intent

Planning is the highest-leverage activity.

A good plan:

Lists exact steps
References concrete files and snippets
Specifies validation after each change
Makes failure modes obvious

A solid plan dramatically constrains agent behavior.

Bad plans produce dozens of bad lines of code.

Bad research produces hundreds.

Implement: Mechanical Execution

Once the plan is correct:

Execution becomes mechanical
Context remains small
Reliability increases

This is where token spend actually pays off.

Mental Alignment and Code Review

Code review is primarily about shared understanding, not syntax.

As AI output scales, reviewing thousands of lines becomes unsustainable.

High-performing teams:

Review research and plans
Attach agent transcripts or AMP threads to PRs
Show exact steps and test results

Reviewing plans preserves architectural coherence as throughput increases.

Limits: AI Does Not Replace Thinking

AI amplifies the quality of thinking already done.

In cases like deep architectural refactors or legacy systems with hidden invariants, teams must return to human design first.

There is no perfect prompt.

There is no silver bullet.

Thinking cannot be outsourced.

Choosing the Right Level of Context Engineering

Task Type	Recommended Approach
UI tweak	Direct instruction
Small feature	Light plan
Cross-repo change	Research + plan
Deep refactor	Full RPI + human design

The ceiling of problem difficulty rises with context discipline.

What Comes Next

Coding agents will be commoditized.

The real challenge is adapting:

Team workflows
SDLC processes
Cultural norms

Without this, teams risk:

Juniors shipping slop
Seniors cleaning it up
Technical debt scaling with AI usage

This is a workflow and leadership problem, not a tooling one.

Key Takeaways

Context is the only lever that matters
More tokens often reduce correctness
Intentional compaction is mandatory
Research and planning are the highest ROI activities
AI amplifies thinking—it does not replace it

Source & Attribution

This article is a faithful technical adaptation of:

Dex Horthy — *Advanced Context Engineering for Coding Agents*

📺 https://youtu.be/rmvDxxNubIg?si=GtPAqK-lnY58dlIO

All ideas, terminology, and frameworks originate from the referenced talk.

Forem: Alex Metelli