<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Amar Dhillon</title>
    <description>The latest articles on Forem by Amar Dhillon (@amarjit_dhillon).</description>
    <link>https://forem.com/amarjit_dhillon</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1584447%2F7ab92d73-0b64-45a6-9a58-f9259fb79b9f.jpg</url>
      <title>Forem: Amar Dhillon</title>
      <link>https://forem.com/amarjit_dhillon</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/amarjit_dhillon"/>
    <language>en</language>
    <item>
      <title>Agent Memory Architecture 🧠</title>
      <dc:creator>Amar Dhillon</dc:creator>
      <pubDate>Thu, 23 Apr 2026 15:44:46 +0000</pubDate>
      <link>https://forem.com/amarjit_dhillon/agent-memory-architecture-d37</link>
      <guid>https://forem.com/amarjit_dhillon/agent-memory-architecture-d37</guid>
      <description>&lt;p&gt;If you are building agentic AI systems then &lt;code&gt;memory management&lt;/code&gt; quickly becomes one of the most confusing parts (You'll soon realize)&lt;/p&gt;

&lt;p&gt;Not because the idea is hard but because there are too many ways to think about it  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;People mix up sessions, context, history, embeddings, user preferences, and logs. Everything starts getting labeled as “memory”  &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I wanted to break this down in a simple way using a practical architecture that I’ve been using  &lt;/p&gt;




&lt;h2&gt;
  
  
  The First Principle: Memory is Not One Thing 🤔
&lt;/h2&gt;

&lt;p&gt;When we say “agent memory”, we are actually talking about multiple layers working together  &lt;/p&gt;

&lt;p&gt;At a high level, you can think of memory as two major categories  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Short-term memory
&lt;/li&gt;
&lt;li&gt;Long-term memory
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;both serve very different purposes as shown in below diagram&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduwlmsvfepax22l1cx60.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduwlmsvfepax22l1cx60.png" alt=" " width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Memory ID: The Logical Vault 🏦
&lt;/h2&gt;

&lt;p&gt;Before diving into types of memory, there is one important concept  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory ID&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Think of it as a logical vault that holds everything related to a user or system . Inside this vault, memory is organized in a structured way&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;memoryId = the memory store / vault

├── short-term memory
     └── actorId → sessionId → events

└── long-term memory
      └── namespace → memory records
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This abstraction is powerful because it decouples memory from the agent  &lt;/p&gt;

&lt;p&gt;Keep in mind that memory is not tied to a specific agent, it is tied to an &lt;strong&gt;actor&lt;/strong&gt;  &lt;/p&gt;




&lt;h2&gt;
  
  
  Short-Term Memory: Covers what Just Happened 🧩
&lt;/h2&gt;

&lt;p&gt;Short-term memory is all about the current interaction&lt;br&gt;&lt;br&gt;
It captures the flow of a conversation or task in real time  &lt;/p&gt;

&lt;p&gt;At its core, it has a simple hierarchy  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Actor ID → identifies the user or system
&lt;/li&gt;
&lt;li&gt;Session ID → one continuous interaction
&lt;/li&gt;
&lt;li&gt;Events → the smallest unit of memory
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An event can be anything like  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user prompt
&lt;/li&gt;
&lt;li&gt;tool invocation
&lt;/li&gt;
&lt;li&gt;assistant response
So a session becomes a sequence of events
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  What About Branching? 🌿
&lt;/h3&gt;

&lt;p&gt;Things get interesting when you introduce branching . Instead of having a single linear flow, you can fork memory into multiple branches  &lt;/p&gt;

&lt;p&gt;For example  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;main conversation
&lt;/li&gt;
&lt;li&gt;flight agent path (me going from Ottawa to NZ)
&lt;/li&gt;
&lt;li&gt;hotel agent path  (Finding a cool spot in Queentown)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each branch can evolve independently while still being tied to the same session. This becomes very useful in multi-agent systems where different agents explore different reasoning paths  &lt;/p&gt;




&lt;h2&gt;
  
  
  Long-Term Memory: Covers what Should Be Remembered 📚
&lt;/h2&gt;

&lt;p&gt;Short-term memory is temporary&lt;br&gt;&lt;br&gt;
Long-term memory is intentional  &lt;/p&gt;

&lt;p&gt;This is where you decide what is worth keeping . Tt is not just one thing. Long-term memory has multiple types  &lt;/p&gt;




&lt;h3&gt;
  
  
  1. Semantic Memory (Facts as Vectors) 🔎
&lt;/h3&gt;

&lt;p&gt;This is your classic vector database layer . You store facts, embeddings, and retrievable knowledge  &lt;/p&gt;

&lt;p&gt;Examples  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user preferences
&lt;/li&gt;
&lt;li&gt;structured facts
&lt;/li&gt;
&lt;li&gt;extracted knowledge
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A system like &lt;code&gt;OpenSearch&lt;/code&gt; or any &lt;code&gt;vector DB&lt;/code&gt; works well here&lt;br&gt;&lt;br&gt;
This enables &lt;em&gt;similarity search&lt;/em&gt; and &lt;em&gt;contextual recall&lt;/em&gt;  &lt;/p&gt;




&lt;h3&gt;
  
  
  2. Episodic Memory (What Happened Over Time) 🕰️
&lt;/h3&gt;

&lt;p&gt;This is about history . Instead of storing raw conversations forever, you summarize them  &lt;/p&gt;

&lt;p&gt;You keep  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;summarized conversations
&lt;/li&gt;
&lt;li&gt;logs
&lt;/li&gt;
&lt;li&gt;historical patterns
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typically stored in something like &lt;code&gt;S3&lt;/code&gt; or &lt;code&gt;object storage&lt;/code&gt;&lt;br&gt;&lt;br&gt;
This helps the agent recall past interactions without loading everything into context  &lt;/p&gt;




&lt;h3&gt;
  
  
  3. Procedural Memory (How the System Behaves) ⚙️
&lt;/h3&gt;

&lt;p&gt;This is often overlooked but very important . Procedural memory defines how the agent operates  &lt;/p&gt;

&lt;p&gt;It includes  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tool definitions
&lt;/li&gt;
&lt;li&gt;policies
&lt;/li&gt;
&lt;li&gt;system rules
&lt;/li&gt;
&lt;li&gt;configurations
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of this as the “operating manual” for the agent  &lt;/p&gt;




&lt;h2&gt;
  
  
  Connecting Short-Term and Long-Term Memory 🔄
&lt;/h2&gt;

&lt;p&gt;Now the real question&lt;br&gt;&lt;br&gt;
How do these two layers work together???&lt;/p&gt;

&lt;p&gt;There are two main flows  &lt;/p&gt;




&lt;h3&gt;
  
  
  Persist Flow (Short → Long)
&lt;/h3&gt;

&lt;p&gt;Not everything from &lt;code&gt;short-term memory&lt;/code&gt; should be stored&lt;br&gt;&lt;br&gt;
You apply strategies like  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Summarization
&lt;/li&gt;
&lt;li&gt;Extraction
&lt;/li&gt;
&lt;li&gt;Filtering
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only important signals get promoted to &lt;code&gt;long-term memory&lt;/code&gt;  &lt;/p&gt;




&lt;h3&gt;
  
  
  Recall Flow (Long → Short)
&lt;/h3&gt;

&lt;p&gt;When a new request comes in, then agent retrieves relevant long-term memory and injects it into the current context  &lt;/p&gt;

&lt;p&gt;This is how personalization and continuity happen  &lt;/p&gt;




&lt;h2&gt;
  
  
  Why Actor-Centric Memory Matters ? 👤
&lt;/h2&gt;

&lt;p&gt;One subtle but important design choice . Memory is attached to &lt;strong&gt;actorId&lt;/strong&gt;, not agent  &lt;/p&gt;

&lt;p&gt;This means  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one user can interact with multiple agents
&lt;/li&gt;
&lt;li&gt;memory remains consistent across all of them
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also means your system is &lt;strong&gt;agent-agnostic&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Whether you have  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 agent
&lt;/li&gt;
&lt;li&gt;10 agents
&lt;/li&gt;
&lt;li&gt;or no agent (just APIs)
The memory model still works
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Practical Example 🛫
&lt;/h2&gt;

&lt;p&gt;Let’s say Amar (user) is booking travel  &lt;/p&gt;

&lt;p&gt;Short-term memory tracks  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;current booking session
&lt;/li&gt;
&lt;li&gt;user inputs
&lt;/li&gt;
&lt;li&gt;tool calls
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Long-term memory stores  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Preference for window seat and business class &lt;/li&gt;
&lt;li&gt;Vegetarian meals
&lt;/li&gt;
&lt;li&gt;Past trips
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the user comes back later&lt;br&gt;&lt;br&gt;
The system does not start from scratch, It recalls preferences and continues seamlessly  &lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways I want you to remember from this blog 💡
&lt;/h2&gt;

&lt;p&gt;Memory in agent systems is not just about storing chat history . It is about structuring information in a way that supports reasoning, recall and personalization  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Short-term memory handles the now
&lt;/li&gt;
&lt;li&gt;Long-term memory handles the forever
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real power comes from how you connect the two  &lt;/p&gt;

</description>
      <category>ai</category>
      <category>agentcore</category>
      <category>agentskills</category>
      <category>agents</category>
    </item>
    <item>
      <title>Shadow Deployments for AI Agents: Test in Prod without breaking anything 🚀</title>
      <dc:creator>Amar Dhillon</dc:creator>
      <pubDate>Thu, 23 Apr 2026 05:39:39 +0000</pubDate>
      <link>https://forem.com/amarjit_dhillon/shadow-deployments-for-ai-agents-test-in-production-without-breaking-anything-55e2</link>
      <guid>https://forem.com/amarjit_dhillon/shadow-deployments-for-ai-agents-test-in-production-without-breaking-anything-55e2</guid>
      <description>&lt;p&gt;If you’ve worked with AI agents in production, you already know one thing. Deploying a new version is not the same as deploying traditional software&lt;/p&gt;

&lt;p&gt;With non AI systems, you push code and then run tests. If everything looks fine then you go live&lt;/p&gt;

&lt;p&gt;With agents, things get messy. The same input can produce slightly different outputs. Improvements in reasoning might come with unexpected side effects. Sometimes a “better” model performs worse in edge cases that actually matter&lt;/p&gt;

&lt;p&gt;So the real challenge is not building a better agent. The challenge is &lt;strong&gt;proving that it’s better before users see it&lt;/strong&gt; 🔍&lt;/p&gt;




&lt;h3&gt;
  
  
  Why Traditional Deployment Fails for Agents 🤔
&lt;/h3&gt;

&lt;p&gt;The core issue is that &lt;em&gt;agent behavior is not deterministic&lt;/em&gt;. You can’t rely on a handful of test cases and assume production will behave the same way. Even if your &lt;em&gt;offline evaluations&lt;/em&gt; look great then real users can bring unpredictable inputs, messy context and ambiguous intent&lt;/p&gt;

&lt;p&gt;This means a direct rollout is risky. If something goes wrong, it’s not always obvious. it can give:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slightly worse answers&lt;/li&gt;
&lt;li&gt;Slightly more hallucinations&lt;/li&gt;
&lt;li&gt;Slightly longer responses that annoy users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the time you notice, the damage is already done 😬&lt;/p&gt;




&lt;h3&gt;
  
  
  The Idea Behind Shadow Deployments 🧠
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fckjjow3lrhh14r6x8pv5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fckjjow3lrhh14r6x8pv5.png" alt=" " width="701" height="804"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As shown in the above diagram, instead of replacing your current agent (V1) you run the new version (V2) alongside it&lt;/p&gt;

&lt;p&gt;The user sends a request and your system (Orchestrator in this case) does something interesting behind the scenes&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;stable agent&lt;/code&gt; handles the request as usual and returns the response to the user&lt;/li&gt;
&lt;li&gt;At the same time, the &lt;code&gt;new agent (V2)&lt;/code&gt; receives the exact same input but its output is never shown to the user. It just runs quietly in the background 🏃🏻‍♂️&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what I call a &lt;strong&gt;shadow path&lt;/strong&gt; 👻&lt;/p&gt;

&lt;p&gt;You are effectively replaying real production traffic through your new agent without exposing any risk. The _user experience _remains unchanged but you now have a way to observe how the &lt;code&gt;new version&lt;/code&gt; behaves under real conditions&lt;/p&gt;




&lt;h3&gt;
  
  
  What Actually Happens Under the Hood? ⚙️
&lt;/h3&gt;

&lt;p&gt;At the center of this setup is an orchestrator; It takes incoming requests and sends them down 2 paths&lt;/p&gt;

&lt;p&gt;The first path is the &lt;em&gt;live path&lt;/em&gt;, which goes to your &lt;code&gt;stable agent&lt;/code&gt;. This is the version you trust. It produces the response that the user sees&lt;/p&gt;

&lt;p&gt;The second path is the &lt;em&gt;shadow path&lt;/em&gt;. This goes to your &lt;code&gt;canary agent&lt;/code&gt; which is the version you’re testing. It receives the same input often with the** same context and knowledge sources** but its output is held back&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Its important to note that, to make this comparison meaningful, both agents typically rely on the &lt;strong&gt;same knowledge base.&lt;/strong&gt; If one agent had access to different data, you wouldn’t know whether the difference in output came from better reasoning or just better information. Keeping the data layer consistent ensures you are comparing apples to apples 🍎&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Comparing Outputs Is Where the Magic Happens ⚖️
&lt;/h3&gt;

&lt;p&gt;Now comes the tricky part. How do you decide which output is better?&lt;/p&gt;

&lt;p&gt;You could try to define strict rules, but language is messy. Quality is subjective. What looks better to one evaluator might not look better to another&lt;/p&gt;

&lt;p&gt;This is where the idea of using an &lt;strong&gt;LLM-as-a-judge&lt;/strong&gt; comes in. A &lt;em&gt;reasoning model&lt;/em&gt; can evaluate both responses and decide which one is more accurate or more aligned with the user’s intent&lt;/p&gt;

&lt;p&gt;Over time, you start collecting signals&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maybe the new agent wins 65% of the time&lt;/li&gt;
&lt;li&gt;Maybe it’s more accurate but slightly slower&lt;/li&gt;
&lt;li&gt;Maybe it handles complex queries better but struggles with short factual ones&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this gets logged and analyzed 📊&lt;/p&gt;




&lt;h3&gt;
  
  
  Turning Observations Into Decisions 🔁
&lt;/h3&gt;

&lt;p&gt;After running this setup for a while, patterns begin to emerge. You can see latency differences, cost implications and even qualitative improvements in reasoning.&lt;/p&gt;

&lt;p&gt;At this point, promoting the canary is no longer a risky move;It becomes a controlled decision&lt;/p&gt;

&lt;p&gt;If the new agent consistently performs better and meets your criteria, you promote it to production. &lt;strong&gt;The canary becomes the new stable version and the cycle continues&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Things That Still Need Careful Thought ⚠️
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Shadow deployments are powerful but they are not free&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Running two agents in parallel increases cost, so many teams sample traffic instead of shadowing everything&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Latency also needs to be isolated so the shadow path never slows down the user response&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Evaluation quality is another challenge. LLM-as-a-judge works well, but it can be inconsistent. Many teams improve this by combining automated evaluation with occasional human review&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Observability becomes critical. You need to track inputs, outputs, context, and decisions in a structured way. Without that, you are just collecting noise&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  The Bigger Picture 🧩
&lt;/h3&gt;

&lt;p&gt;If you are serious about building production-grade AI agents this is &lt;strong&gt;not just a nice-to-have pattern&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s one of the foundational pieces that makes everything else possible 🚀&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agentskills</category>
      <category>aws</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
