Forem: Vonta Johnson

Context Engineering: Building Better AI Agents

Vonta Johnson — Sun, 03 Aug 2025 22:26:29 +0000

If you've been building with LLMs for a while, you've probably experienced that moment when the model feels like it's more of a hindrance than a help. Worsening quality of answers, constant reminders to start a new chat window, and constraints that continually get forgotten. These are all symptoms of what happens when the models context starts to get too polluted. When building ai agents these problems can be the difference between having a reliable production ready agent and having a system that works great in your local tests but breaks in production.

To bridge this gap and have a consistently reliable agentic system, you need to know how to effectively manage context. This is where context engineering comes into play and is essential for reliable agent building at scale.

Understanding Context in LLMs

Before we dive into context engineering, let's clarify what we mean by "context" in the world of large language models.

Context is everything the LLM can see and process when handling a task. This is essentially all the data The model has access to and can reference in order to fulfill the request. This includes things like:

Your prompts and system instructions
Previous messages in the conversation
RAG documents and retrieved information
Tool call results and responses
Multimodal inputs like images, audio, or documents

Every LLM has a context window. A context window is a hard limit on how much information (measured in tokens) the model can access and reference within a single session. When you exceed this limit, the model either truncates information or refuses to process the request entirely. The limit can range from hundreds of thousands of tokens to millions. The type of input provided to the llm (text vs image etc.) also has a significant impact on token usage.

What Is Context Engineering?

Context engineering as described by Andrej Karpathy, former director of AI @ Tesla is: the delicate art and science of filling the context window with just the right information for the next step.

This goes beyond the prompt provided and manages the entire system your agent or llm can access. Think of it as fine tuning the brain of the agent. When a person needs to solve a problem, there are various inputs and reference points we use to discern the right solution. Managing this for an agent is the essential nature of context engineering.

Context Engineering vs. Prompt Engineering

You may already be familiar with the term prompt engineering, but that is just one piece of building reliable AI systems.

Prompt Engineering focuses on:

Writing clear, specific instructions
Using effective prompt templates and formats
Crafting examples and demonstrations
Optimizing the immediate request to the model

Context Engineering focuses on:

Managing the entire information environment over time
Deciding what information to include or exclude across multiple interactions
Handling long-running conversations and sessions
Optimizing for reliability and consistency at scale

Think of it this way: prompt engineering is like perfecting one song on an album. Context engineering is managing the entire tracklist and production.

Why This Matters Building Agents

When trying to build scalable agents, there are a number of factors to consider. Typically an agent must do all of these things:

Maintain long conversations
Access multiple data sources
Use various tools
Remember past decisions
Handle complex, multi-step tasks

Without effective context engineering you can quickly hit the wall on expected performance and have inefficient token usage which directly translates to increased costs. Writing better prompts will not solve this issue.

Poor context management can lead to these 4 common context failures:

Context Poisoning - Incorrect or hallucinated responses interfering with the context.
Context Clashes - Conflicting information causing inconsistent outputs.
Context Confusion - Irrelevant data in the context influencing the output.
Context Distraction - The context overwhelming the models training.

You can read more about each of these failures here.

The Four Pillars of Context Engineering

To avoid common context failures, these four key principles will help you to build more robust and production ready systems. While none of these techniques are foolproof, they each give us greater control over our context than without.

1. Write Context

Storing information outside of the context window for later retrieval and usage.

This is about creating persistent memory systems that can maintain important information across sessions and interactions. This can be done in a file that the llm can read from or directly in the state that is managed by the agent.

Key strategies:

Use file-based memory systems (like Claude.md files or rules files in cursor)
Implement embedded document stores for large collections of facts
Build autonomous memory creation based on user feedback and interactions
Store structured information that can't fit in context windows

2. Select Context

Selecting only the relevant information needed for the context to accomplish the task at hand.

Smart retrieval ensures you're including the right information while excluding the noise that degrades performance. Only selecting the necessary information gives more accurate context to solve the problem and reduces token usage which saves money.

Key strategies:

Implement semantic similarity scoring for document retrieval
Use relevance thresholds to filter out low-quality matches
Create retrieval mechanisms that consider recency, importance, and similarity
Build fallback strategies when no highly relevant information is found
Design retrieval that adapts to the specific task requirements

3. Compress Histories

Distilling down to only the required tokens to complete a task.

As conversations and agent trajectories get longer, intelligent compression can help with maintaining performance. This involves summarizing information and/or removing information where necessary.

Key strategies:

Implement context summarization at natural boundaries (completed phases, tool calls)
Summarize token-heavy tool call feedback while preserving key insights
Create checkpoints for important conversation milestones
Apply auto-compaction when approaching context limits

4. Isolate Contexts

Splitting up the context between multiple windows or agents to break down the task.

Sometimes the best context management is context separation, allowing different components to focus on specific aspects.

This principle is demonstrated in Anthropic's multi-agent research system. Instead of cramming everything into one massive context, they use specialized subagents that each operate with their own focused context windows. This allows them to scale beyond what any single agent could handle while maintaining clarity and focus.

Key strategies:

Use multiple specialized agents instead of one overpowered agent
Create sandbox environments that isolate objects from the main context
Design multi-agent systems for easily parallelizable tasks
Separate concerns across different context windows with clear boundaries

Key Takeaways

To wrap up:

Context engineering is distinct from prompt engineering and focuses on managing the entire information environment.
The four main problems (poisoning, clashing, confusion, distraction) can break even well-designed systems.
The four pillars (write, select, compress, isolate) provide a framework for systematic improvement.
Better context management directly translates to lower costs, higher reliability, and better user experience.

Context engineering is still an evolving field, and best practices are still being discovered through real-world experimentation. The more we share what works (and what doesn't), the faster we'll all build better AI systems.

What context engineering techniques have you tried? What worked, and what didn't? Let me know your experience with this new concept!

Write Code Like Sherlock Holmes: The Art of Deductive Development 🔍

Vonta Johnson — Thu, 31 Jul 2025 14:48:50 +0000

I've always found Sherlock Holmes to be an interesting character in stories. Ever since that first story I've been fascinated by the detective and can't get enough of reading his stories as told by Dr. Watson. So much so, that I've come up with a few principles to take from the great detective that I apply when it comes to software engineering.

Both Holmes and great developers solve complex problems by gathering comprehensive information, forming theories based on evidence rather than assumptions, tracing issues to their source, and collaborating to validate their reasoning.

Whether you're planning a new feature, debugging a production issue, or architecting a system, here are a few ways to apply the skills of the detective to your next goal.

Gather All the Facts First

"Before I can judge what is or is not relevant, I must know all the facts."

The first thing Holmes does when it comes to solving a case (often before he decides if he will take a case) is to ask questions. Lots of them. Questions that often would lead to the potential client asking "Why are you asking me this?". This very thorough initial inquisition would lay the groundwork for everything about the case that was to follow. In development, this means exhaustive questioning during planning, before you write any code.

I'm often cited for asking great questions both before and during a new implementation. Questions like:

About Users:

Who exactly will use this?
What device will this primarily be used on?
What is the foundational problem that we want to solve?

About Constraints:

What type of performance is expected?
Are there any compliance requirements? (HIPAA, GDPR, SOX)
What data do we need to implement this and who owns it?

About Our Product:

Is this something new or something we are expanding on?
Will we need to integrate any new technology to implement this?
What failure mechanisms should we have in place if x assumed service stops working?

These types of questions can reveal crucial constraints and expectations. Ask as many as you can think of. Not just to figure out what you do need, but sometimes more importantly what you do not. Just as Holmes investigating a victim's family history might reveal a crucial detail that helps solve the current case, thorough questioning helps to get to the right answers that can get rid of problems before they occur during development.

Let Facts Shape Your Theory, Not the Other Way Around

"It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts."

Holmes never formed theories before gathering evidence. He invariably gathered facts as a case went on and never preemptively formed a thesis beforehand. In development, this means avoiding over-engineering and abstractions before they are necessary.

Sometimes we as developers have a bad habit of over-engineering a service in the name of "scale". Let's say you have a need to implement a queue into your application. You first impulse should not be to look towards a third party library, or an entirely new service to bring into your app. For me it would be to implement your own simple queue. The answer will depend on the number of users and how large you expect this queue to get, but that should be determined in the planning phase. If a simple queue will be enough to satisfy your current use-case, start there. Iterate on your solution and let the more complex solutions only be brought in as necessary. Every new service you bring into the app is a new tool the team must learn and a new tool that you are the mercy of if they decide to update or change any of their feature-set.

I also apply this to creating abstractions. I favor localization and keep functionality close to where it's needed. If error handling logic is only used in one component, it doesn't need to be abstracted into five different files. If a utility function serves only the user profile page, keep it in that module until you need it elsewhere.

This approach prevents the classic developer trap of building elaborate solutions to problems you don't actually have. Start simple, add complexity only when the facts demand it.

Start at the Source

"When you have eliminated the impossible, whatever remains, however improbable, must be the truth."

Holmes liked to find out what the motivation for a criminal would be. Asking questions and investigating to get to the initial driving factor that would lead to whatever crime there's been. Similarly, when debugging, I like to start at the data source, not where the error appears.

Here's a real scenario: users report that their dashboard shows yesterday's sales figures instead of today's. The UI may indeed be displaying the wrong data, but debugging the UI first is like arresting the first person at a crime scene.

Instead, I like to trace starting from the source. Especially if it's a part of the application you're not as familiar with. It can look something like this:

Database: Run the query manually. Does it return today's data?
API: Hit the endpoint directly. Is the backend serving correct data?
API Communication: Check dev tools network tab. Is the request being made correctly?
Caching: Are we serving any stale data from the cache?
State Management: Is the component receiving correct data but not rendering correctly?

In this example, you might discover that the data in the db is correct, but the api has an issue with a timezone conversion and is sending the data from the wrong date. Starting at the UI often times assumes that the other pieces of the application are completely working correctly. Unfortunately, in my experience that is not an assumption you can make when determining the root cause of an issue.

Holmes investigated a victim's entire history to understand the motivation for a crime. So should we investigate the complete data flow of our systems to understand the source of errors. Eliminate the impossible explanations systematically until you're left with the truth, however unexpected.

Have Your Watson

Holmes relied on Watson not as a sidekick, but as a crucial partner in his investigative process. Explaining his reasoning to Watson forced him to articulate his logic and could reveal flaws in his reasoning or spark new insights based on Watson's feedback.

Every developer needs a Watson when the moment calls. My rule: if I'm stuck for 30 minutes with no clear path forward, I find someone to talk through the problem.

Other times you may need a Watson:

Pair programming: Working through the problem together in real-time to find a solution.
Code review: Getting someone else feedback on your solution can either validate your approach or reveal a gap you need to close.
Getting Unstuck: Explaining your train of thought once you've hit a roadblock to someone else can reveal a crucial point you've overlooked that can get you to the outcome you need.

Your Watson doesn't need to be more senior than you. Sometimes a junior developer asking "Why did you choose this approach?" makes you realize you're overcomplicating things. Sometimes a colleague from another team can spot an integration issue you missed because you were too close to the problem.

The key is being able to articulate your reasoning and being open to feedback. Do not tie yourself to any code. Be focused on the outcome and not your solution. This forces you to examine your assumptions and can give you more confidence in the solution you've chosen. Your teammates can help you the same as Watson helps Holmes see cases from new angles.

The Deductive Developer

These are four principles I like to follow when it comes to Deductive Development: gather information before coding, build only what the requirements demand, trace problems to their true source, and collaborate to validate.

The next time you face a complex development challenge, channel Holmes' methodology. Ask exhaustive questions during planning. Let the facts shape your solution rather than forcing facts to fit your preconceptions. When debugging, start at the data source and work your way up. And when you're stuck, find your Watson to help you see what you might have missed.

The best developers, like the world's greatest fictional detective, understand that methodology trumps genius.
When you approach your next development challenge, remember that every feature request is a case to be solved, every bug is evidence waiting to be interpreted, and every system is a mystery with logical rules governing its behavior.

After all, there's no mystery so complex that it can't be unraveled by the right questions, the right evidence, and the right methodology.