<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: nghiach</title>
    <description>The latest articles on Forem by nghiach (@chnghia).</description>
    <link>https://forem.com/chnghia</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3682079%2Fee01a5bf-c192-4de2-bcc0-ed5ba0cbf8ed.png</url>
      <title>Forem: nghiach</title>
      <link>https://forem.com/chnghia</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/chnghia"/>
    <language>en</language>
    <item>
      <title>Beyond RAG: Why AI Agents Need Memory as an Asset — Not a Cache</title>
      <dc:creator>nghiach</dc:creator>
      <pubDate>Thu, 26 Feb 2026 14:30:30 +0000</pubDate>
      <link>https://forem.com/chnghia/beyond-rag-why-ai-agents-need-memory-as-an-asset-not-a-cache-5hbi</link>
      <guid>https://forem.com/chnghia/beyond-rag-why-ai-agents-need-memory-as-an-asset-not-a-cache-5hbi</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Most AI agents today have memory.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;But almost all of them forget.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We’ve built smarter models.&lt;br&gt;
We’ve built tool-using agents.&lt;br&gt;
We’ve built autonomous loops.&lt;/p&gt;

&lt;p&gt;And yet — when you talk to most AI systems for a week, they still feel… stateless.&lt;/p&gt;

&lt;p&gt;They don’t evolve with you.&lt;br&gt;
They don’t accumulate structured knowledge.&lt;br&gt;
They don’t grow.&lt;/p&gt;

&lt;p&gt;The problem is not reasoning.&lt;br&gt;
The problem is memory.&lt;br&gt;
And more specifically — the way we think about memory is fundamentally broken.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Illusion of Memory in Modern AI Agents
&lt;/h2&gt;

&lt;p&gt;Let’s look at how “memory” is implemented today.&lt;/p&gt;
&lt;h3&gt;
  
  
  1️⃣ Chat History
&lt;/h3&gt;

&lt;p&gt;Most systems treat memory as conversation history.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Append messages.&lt;/li&gt;
&lt;li&gt;Trim old tokens.&lt;/li&gt;
&lt;li&gt;Inject recent context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not memory.&lt;br&gt;
This is a sliding window buffer.&lt;/p&gt;

&lt;p&gt;It is fragile, token-bound, and disposable.&lt;/p&gt;


&lt;h3&gt;
  
  
  2️⃣ Vector Databases
&lt;/h3&gt;

&lt;p&gt;The more advanced pattern?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store embeddings.&lt;/li&gt;
&lt;li&gt;Retrieve semantically similar chunks.&lt;/li&gt;
&lt;li&gt;Inject into prompt.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is often called “long-term memory.”&lt;br&gt;
But it’s still not memory.&lt;br&gt;
It’s retrieval.&lt;br&gt;
Frameworks like LangChain and many RAG systems popularized this pattern.&lt;br&gt;
Even large research organizations like OpenAI have experimented with memory retrieval layers.&lt;/p&gt;

&lt;p&gt;But let’s be honest:&lt;br&gt;
A vector store is not memory.&lt;br&gt;
It’s an index.&lt;/p&gt;


&lt;h3&gt;
  
  
  3️⃣ RAG
&lt;/h3&gt;

&lt;p&gt;Retrieval-Augmented Generation (RAG) is powerful.&lt;br&gt;
But RAG is about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Injecting documents&lt;/li&gt;
&lt;li&gt;Improving grounding&lt;/li&gt;
&lt;li&gt;Reducing hallucination&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RAG does not answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who owns the memory?&lt;/li&gt;
&lt;li&gt;How does memory evolve?&lt;/li&gt;
&lt;li&gt;How is memory versioned?&lt;/li&gt;
&lt;li&gt;When should memory be archived?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RAG treats memory as context.&lt;br&gt;
Not as a governed asset.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why This Is Fundamentally Broken
&lt;/h2&gt;

&lt;p&gt;If we want agents to move beyond chatbots, three things must change.&lt;/p&gt;


&lt;h3&gt;
  
  
  ❗ 1. No Lifecycle
&lt;/h3&gt;

&lt;p&gt;In most systems:&lt;br&gt;
Memory is written.&lt;br&gt;
And then… it just sits there.&lt;/p&gt;

&lt;p&gt;No:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Validation&lt;/li&gt;
&lt;li&gt;Confirmation&lt;/li&gt;
&lt;li&gt;Evolution&lt;/li&gt;
&lt;li&gt;Archiving&lt;/li&gt;
&lt;li&gt;Deprecation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In real systems — especially enterprise systems — data has lifecycle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Draft&lt;/li&gt;
&lt;li&gt;Review&lt;/li&gt;
&lt;li&gt;Commit&lt;/li&gt;
&lt;li&gt;Archive&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why should agent memory be different?&lt;/p&gt;


&lt;h3&gt;
  
  
  ❗ 2. No Governance
&lt;/h3&gt;

&lt;p&gt;Enterprise AI requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Audit trails&lt;/li&gt;
&lt;li&gt;Versioning&lt;/li&gt;
&lt;li&gt;Ownership&lt;/li&gt;
&lt;li&gt;Access control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But current agent memory layers are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implicit&lt;/li&gt;
&lt;li&gt;Opaque&lt;/li&gt;
&lt;li&gt;Unstructured&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If an agent “remembers” something wrong, how do you correct it?&lt;br&gt;
If it infers a pattern incorrectly, how do you roll it back?&lt;br&gt;
Without governance, memory becomes liability.&lt;/p&gt;


&lt;h3&gt;
  
  
  ❗ 3. No Abstraction
&lt;/h3&gt;

&lt;p&gt;Most frameworks reduce memory to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;embedding → similarity search → context injection
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s an implementation detail.&lt;br&gt;
Memory is not an embedding.&lt;br&gt;
Memory is structured knowledge that evolves over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Shift in Mindset: Memory as an Asset
&lt;/h2&gt;

&lt;p&gt;In modern data engineering, platforms like &lt;a href="https://dagster.io/" rel="noopener noreferrer"&gt;Dagster&lt;/a&gt;, &lt;a href="https://hamilton.apache.org/" rel="noopener noreferrer"&gt;Apache Hamilton&lt;/a&gt; introduced a powerful idea:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Treat data as assets.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Assets are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Versioned&lt;/li&gt;
&lt;li&gt;Observable&lt;/li&gt;
&lt;li&gt;Governed&lt;/li&gt;
&lt;li&gt;Materialized through pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What if we applied the same thinking to agent memory?&lt;/p&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Memory = context&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We define:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Memory = Asset&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That means memory becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First-class&lt;/li&gt;
&lt;li&gt;Queryable&lt;/li&gt;
&lt;li&gt;Lifecycle-managed&lt;/li&gt;
&lt;li&gt;Policy-enforced&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This changes everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  From Chatbot to Stateful System
&lt;/h2&gt;

&lt;p&gt;There is an evolution happening in AI systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: Chatbot
&lt;/h3&gt;

&lt;p&gt;Stateless. Reactive.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: Tool-Using Agent
&lt;/h3&gt;

&lt;p&gt;Calls APIs. Slightly more capable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: Autonomous Agent
&lt;/h3&gt;

&lt;p&gt;Loops. Plans. Executes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 4 (Emerging): Stateful System
&lt;/h3&gt;

&lt;p&gt;Structured memory. Lifecycle. Governance.&lt;/p&gt;

&lt;p&gt;The first three focus on reasoning.&lt;br&gt;
The fourth focuses on continuity.&lt;br&gt;
Continuity is what makes intelligence compound.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Does “Memory as Asset” Actually Mean?
&lt;/h2&gt;

&lt;p&gt;It means we stop thinking about memory as a blob of text.&lt;br&gt;
Instead, we design it like a data system.&lt;/p&gt;




&lt;h3&gt;
  
  
  1️⃣ Memory Has a Lifecycle
&lt;/h3&gt;

&lt;p&gt;Every memory object should go through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Draft → Refine → Commit → Archive&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;User says:&lt;br&gt;
“I started going to the gym.”&lt;/p&gt;

&lt;p&gt;The system should not instantly mutate the user profile permanently.&lt;/p&gt;

&lt;p&gt;Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Draft memory: “User may have started gym habit.”&lt;/li&gt;
&lt;li&gt;Confirm or observe consistency.&lt;/li&gt;
&lt;li&gt;Promote to committed memory.&lt;/li&gt;
&lt;li&gt;Archive if outdated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This mirrors how enterprise systems handle critical data.&lt;/p&gt;




&lt;h3&gt;
  
  
  2️⃣ Memory Has Layers
&lt;/h3&gt;

&lt;p&gt;Not all memory is equal.&lt;br&gt;
A practical structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fact&lt;/strong&gt; — Atomic event&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporal&lt;/strong&gt; — Time-aware sequence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pattern&lt;/strong&gt; — Repeated behavior&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Insight&lt;/strong&gt; — Derived inference&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;Fact:&lt;br&gt;
“User bought coffee.”&lt;/p&gt;

&lt;p&gt;Temporal:&lt;br&gt;
“User buys coffee almost every weekday.”&lt;/p&gt;

&lt;p&gt;Pattern:&lt;br&gt;
“User has strong morning routine behavior.”&lt;/p&gt;

&lt;p&gt;Insight:&lt;br&gt;
“User productivity correlates with early caffeine intake.”&lt;/p&gt;

&lt;p&gt;This layered design prevents shallow personalization.&lt;br&gt;
It enables structured evolution.&lt;/p&gt;




&lt;h3&gt;
  
  
  3️⃣ Memory Is Governed
&lt;/h3&gt;

&lt;p&gt;When memory becomes an asset, it must support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Versioning&lt;/li&gt;
&lt;li&gt;Traceability&lt;/li&gt;
&lt;li&gt;Ownership&lt;/li&gt;
&lt;li&gt;Deletion policies&lt;/li&gt;
&lt;li&gt;Access scope&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is critical for enterprise AI.&lt;br&gt;
Without governance, long-term memory becomes a compliance nightmare.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters for Enterprise AI
&lt;/h2&gt;

&lt;p&gt;Enterprises don’t fear AI because of hallucination.&lt;br&gt;
They fear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uncontrolled state mutation&lt;/li&gt;
&lt;li&gt;Lack of audit trail&lt;/li&gt;
&lt;li&gt;Untraceable decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If an AI system accumulates knowledge about customers, employees, or operations:&lt;br&gt;
It must behave like a data platform — not a chatbot.&lt;br&gt;
Memory as Asset enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trust&lt;/li&gt;
&lt;li&gt;Control&lt;/li&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;li&gt;Compliance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It bridges the gap between:&lt;/p&gt;

&lt;p&gt;LLM experimentation&lt;br&gt;
and&lt;br&gt;
Enterprise-grade systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters for Personal AI
&lt;/h2&gt;

&lt;p&gt;On the consumer side, the impact is even bigger.&lt;/p&gt;

&lt;p&gt;Imagine an AI that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Truly understands your long-term goals&lt;/li&gt;
&lt;li&gt;Evolves with your habits&lt;/li&gt;
&lt;li&gt;Detects patterns in your behavior&lt;/li&gt;
&lt;li&gt;Refines its understanding over months or years&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not by guessing every session.&lt;br&gt;
But by managing memory structurally.&lt;br&gt;
This is how we move from:&lt;/p&gt;

&lt;p&gt;“Smart assistant”&lt;br&gt;
to&lt;br&gt;
“Personal AI Operating System.”&lt;/p&gt;




&lt;h2&gt;
  
  
  The Future of Agentic AI
&lt;/h2&gt;

&lt;p&gt;Today, most research focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better reasoning&lt;/li&gt;
&lt;li&gt;Longer context windows&lt;/li&gt;
&lt;li&gt;Stronger tool use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the real bottleneck is not reasoning power.&lt;br&gt;
It is memory structure.&lt;br&gt;
The future of agentic AI will not be defined by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bigger models&lt;/li&gt;
&lt;li&gt;More tools&lt;/li&gt;
&lt;li&gt;Longer prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It will be defined by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stateful systems&lt;/li&gt;
&lt;li&gt;Structured memory&lt;/li&gt;
&lt;li&gt;Lifecycle governance&lt;/li&gt;
&lt;li&gt;Asset-based intelligence&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Autonomous agents are exciting.&lt;br&gt;
But autonomy without memory discipline is chaos.&lt;br&gt;
If we want AI systems that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Persist&lt;/li&gt;
&lt;li&gt;Personalize&lt;/li&gt;
&lt;li&gt;Compound intelligence&lt;/li&gt;
&lt;li&gt;Earn trust&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We must stop treating memory as a cache.&lt;br&gt;
And start treating it as an asset.&lt;/p&gt;




&lt;p&gt;If this idea resonates with you, I’m currently exploring it deeper through a structured architecture approach for stateful AI systems.&lt;br&gt;
Because the next frontier of AI is not just thinking better.&lt;br&gt;
It’s remembering better.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>agenticai</category>
      <category>aiarchitecture</category>
    </item>
    <item>
      <title>Atomic Stateful Agent — From Architecture Idea to Working Code</title>
      <dc:creator>nghiach</dc:creator>
      <pubDate>Sun, 22 Feb 2026 08:24:20 +0000</pubDate>
      <link>https://forem.com/chnghia/atomic-stateful-agent-from-architecture-idea-to-working-code-1ljh</link>
      <guid>https://forem.com/chnghia/atomic-stateful-agent-from-architecture-idea-to-working-code-1ljh</guid>
      <description>&lt;p&gt;In a previous article, I introduced the idea of the &lt;strong&gt;Atomic Stateful Agent (ASA)&lt;/strong&gt; — an architecture for building AI agents that can operate safely inside real enterprise workflows.&lt;/p&gt;

&lt;p&gt;This repo, &lt;strong&gt;atomic-stateful-agent&lt;/strong&gt;, is the practical implementation of that idea.&lt;/p&gt;

&lt;p&gt;It’s not about making agents “talk better”.&lt;br&gt;
It’s about making agents &lt;strong&gt;behave safely around real system state&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/chnghia/atomic-stateful-agent" rel="noopener noreferrer"&gt;https://github.com/chnghia/atomic-stateful-agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/Vv2As_h7kjM"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  🚨 The Problem with Chat-Centric Agents
&lt;/h2&gt;

&lt;p&gt;Most agent systems today look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User message → LLM → tool call → system update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works well for demos.&lt;br&gt;
But in real systems (ERP, CRM, tickets, finance records…):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The LLM might misunderstand intent&lt;/li&gt;
&lt;li&gt;A half-complete instruction can still trigger a write&lt;/li&gt;
&lt;li&gt;Conversation history becomes your “state” (unstructured, fragile)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s risky.&lt;/p&gt;

&lt;p&gt;Enterprises don’t run on chat history.&lt;br&gt;
They run on &lt;strong&gt;entities, workflows, and transactions&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  💡 What Is an Atomic Stateful Agent?
&lt;/h2&gt;

&lt;p&gt;An &lt;strong&gt;Atomic Stateful Agent (ASA)&lt;/strong&gt; is a system where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workflow logic is modeled as a &lt;strong&gt;state machine&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Business objects are treated as &lt;strong&gt;entities&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Changes happen through a &lt;strong&gt;draft → commit&lt;/strong&gt; process&lt;/li&gt;
&lt;li&gt;LLM reasoning is &lt;strong&gt;controlled&lt;/strong&gt;, not in charge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;LLM = reasoning layer&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;ASA = transactional control layer&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The LLM can suggest.&lt;br&gt;
The system decides.&lt;/p&gt;


&lt;h2&gt;
  
  
  🧩 AIU + ASA: Two Layers, Two Responsibilities
&lt;/h2&gt;

&lt;p&gt;This repo combines two ideas:&lt;/p&gt;
&lt;h3&gt;
  
  
  🧠 AIU — Atomic Inference Unit
&lt;/h3&gt;

&lt;p&gt;(from my earlier article)&lt;/p&gt;

&lt;p&gt;AIU is about &lt;strong&gt;atomic reasoning&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;An AIU:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Has clear structured input&lt;/li&gt;
&lt;li&gt;Produces structured output&lt;/li&gt;
&lt;li&gt;Solves one focused inference task&lt;/li&gt;
&lt;li&gt;Is &lt;strong&gt;stateless&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Does not control workflows or databases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this repo, AIUs are built using the &lt;strong&gt;JIL stack&lt;/strong&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Jinja → Instructor → LiteLLM&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They live mainly in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;core/      # inference abstractions
schemas/   # structured IO contracts
prompts/   # Jinja templates
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AIU = safe, structured thinking&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  🔄 ASA — Atomic Stateful Agent
&lt;/h3&gt;

&lt;p&gt;(the workflow/state layer)&lt;/p&gt;

&lt;p&gt;ASA is about &lt;strong&gt;atomic state control&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workflow steps&lt;/li&gt;
&lt;li&gt;State transitions&lt;/li&gt;
&lt;li&gt;Draft data&lt;/li&gt;
&lt;li&gt;Commit/cancel logic&lt;/li&gt;
&lt;li&gt;Interaction with external systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This layer is implemented through:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nodes/     # workflow nodes
state.py   # AgentState
graph.py   # LangGraph state machine
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AIUs think&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;ASA decides what can happen&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This separation is the core design principle.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔑 Core ASA Patterns in This Repo
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1️⃣ Sticky Routing (Intent Lock)
&lt;/h3&gt;

&lt;p&gt;Once a workflow starts, the agent stays in that intent.&lt;/p&gt;

&lt;p&gt;Example state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;active_intent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;daily_log&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;active_draft&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...},&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;record_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent doesn’t jump to other tasks just because the user changes wording.&lt;/p&gt;




&lt;h3&gt;
  
  
  2️⃣ Draft → Commit Protocol
&lt;/h3&gt;

&lt;p&gt;The agent never writes to the real system immediately.&lt;/p&gt;

&lt;p&gt;Instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User input
   ↓
Create draft state
   ↓
Refine over multiple turns
   ↓
User confirms
   ↓
Commit → persist
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During conversation, the user edits a &lt;strong&gt;draft object&lt;/strong&gt;, not production data.&lt;/p&gt;




&lt;h3&gt;
  
  
  3️⃣ Recall &amp;amp; Hydrate
&lt;/h3&gt;

&lt;p&gt;When editing existing data:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Load record from DB&lt;/li&gt;
&lt;li&gt;Hydrate it into draft state&lt;/li&gt;
&lt;li&gt;Let the user edit&lt;/li&gt;
&lt;li&gt;Commit or cancel&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is not chat memory.&lt;br&gt;
It’s structured state restoration.&lt;/p&gt;




&lt;h2&gt;
  
  
  📦 Project Structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/
├── core/        # AIU foundation (inference layer)
├── schemas/     # structured IO for AIUs
├── prompts/     # Jinja templates (AIU prompts)
├── nodes/       # ASA workflow nodes
├── state.py     # AgentState model
├── mock_db.py   # in-memory DB
└── graph.py     # LangGraph builder

main.py          # CLI entry point
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key idea:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Prompts do not define the system&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;State logic lives in code&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LLM reasoning is modular (AIUs)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Workflow control is deterministic (ASA)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ▶️ Running the Agent
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# add LLM key&lt;/span&gt;

python main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get an interactive console, but the behavior is state-driven — not just free chat.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧪 Example — Creating a Record
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: Log task: Review PR for auth feature
Agent: Draft created

You: Change priority to high
Agent: Draft updated

You: Done
Agent: Task saved successfully!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The task was not saved immediately&lt;/li&gt;
&lt;li&gt;Only after confirmation does it become real&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔁 Editing an Existing Record
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: Edit the PR review task
Agent: Record found. Hydrating draft…

You: Change title to "Review PR plan"
Agent: Draft updated

You: Save
Agent: Task updated!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is structured state editing, not memory guessing.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗 Why This Matters
&lt;/h2&gt;

&lt;p&gt;If your agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Updates tickets&lt;/li&gt;
&lt;li&gt;Modifies contracts&lt;/li&gt;
&lt;li&gt;Handles finance data&lt;/li&gt;
&lt;li&gt;Talks to ERP/CRM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then &lt;strong&gt;prompt engineering is not enough&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;State machines&lt;/li&gt;
&lt;li&gt;Entity models&lt;/li&gt;
&lt;li&gt;Transaction-like control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s what ASA provides.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔄 Architecture → Implementation
&lt;/h2&gt;

&lt;p&gt;My previous article explained the &lt;strong&gt;architecture and mindset&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This repo shows:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Here’s how to actually build it.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Together:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AIU → Atomic reasoning&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;ASA → Atomic state control&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Final Thought
&lt;/h2&gt;

&lt;p&gt;The future of enterprise agents is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Smarter chat”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It’s:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Controlled AI operating on structured state&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Start thinking less about longer prompts,&lt;br&gt;
and more about &lt;strong&gt;state, entities, and workflows&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>Beyond the Chatbot: The “Atomic Stateful Agent” Architecture for Enterprise AI</title>
      <dc:creator>nghiach</dc:creator>
      <pubDate>Wed, 28 Jan 2026 09:00:07 +0000</pubDate>
      <link>https://forem.com/chnghia/beyond-the-chatbot-the-atomic-stateful-agent-architecture-for-enterprise-ai-4504</link>
      <guid>https://forem.com/chnghia/beyond-the-chatbot-the-atomic-stateful-agent-architecture-for-enterprise-ai-4504</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Most AI agents today are great at talking.&lt;br&gt;
Enterprises, however, don’t run on conversations — they run on &lt;strong&gt;transactions, entities, and state&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This article introduces a new architectural mindset: &lt;strong&gt;Atomic Stateful Agents (ASA)&lt;/strong&gt; — a pattern designed not for demos, but for &lt;strong&gt;real enterprise workflows&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. The Pain Point: When Agent Demos Meet Enterprise Reality&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We’ve all seen it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LangGraph demos 🤯&lt;/li&gt;
&lt;li&gt;CrewAI task chains 🚀&lt;/li&gt;
&lt;li&gt;Multi-agent orchestration videos with glowing UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It works beautifully when the task is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Research this topic”&lt;br&gt;
“Summarize this report”&lt;br&gt;
“Generate a plan”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then we bring it into the company.&lt;/p&gt;

&lt;p&gt;Suddenly the task becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Log a task into the system&lt;/li&gt;
&lt;li&gt;Submit a PO approval&lt;/li&gt;
&lt;li&gt;Update project cost&lt;/li&gt;
&lt;li&gt;Modify contract metadata&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And everything… breaks.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Root Cause: The Linear Chat Trap&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Most current agent systems operate like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User → Message → Agent → Tool → Response → Done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This model assumes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;State = conversation history&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But enterprise workflows don’t work like that.&lt;/p&gt;

&lt;p&gt;In real systems:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Reality&lt;/th&gt;
&lt;th&gt;What Chat Agents Assume&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data is stored in &lt;strong&gt;entities&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Data is “remembered” in chat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Changes are &lt;strong&gt;transactions&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Changes happen by “saying things”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Work is &lt;strong&gt;paused and resumed&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Conversations are linear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edits are normal&lt;/td&gt;
&lt;td&gt;Past output is final&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We are trying to fix &lt;strong&gt;structured data&lt;/strong&gt; using &lt;strong&gt;unstructured dialogue&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That’s the mismatch.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;2. The Trinity Model: Brain – Heart – Face&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To move beyond chatbot-style agents, we need a layered view.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The A.S.G Stack&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;What it Represents&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Brain&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;td&gt;LLM, planning, tool selection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Heart&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;State Control&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Atomic Stateful Agent (ASA)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Face&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Interface&lt;/td&gt;
&lt;td&gt;Chat UI, Forms, APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most systems today focus almost entirely on the &lt;strong&gt;Brain&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But enterprises fail without the &lt;strong&gt;Heart&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  ❤️ &lt;strong&gt;ASA = The Heart&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;Atomic Stateful Agent&lt;/strong&gt; is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Managing &lt;strong&gt;entities&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Controlling &lt;strong&gt;state transitions&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Enforcing &lt;strong&gt;workflow structure&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Acting as a &lt;strong&gt;transaction boundary&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s not “smart” in the LLM sense.&lt;br&gt;
It’s &lt;strong&gt;deterministic, structured, and stubborn&lt;/strong&gt; — by design.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The Brain can improvise.&lt;br&gt;
The Heart must not.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  &lt;strong&gt;3. The Core Patterns (The Real Power)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This is where ASA becomes fundamentally different from “fire-and-forget” agents.&lt;/p&gt;


&lt;h3&gt;
  
  
  🔒 Pattern 1: &lt;strong&gt;Sticky Routing (The Lock)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt;&lt;br&gt;
Normal routers behave statelessly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Message A → Agent X  
Message B → Agent Y  
Message C → Agent Z
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But enterprise tasks are &lt;strong&gt;sessions tied to an entity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Example: Creating a Purchase Order.&lt;/p&gt;

&lt;p&gt;Once a user starts editing PO #2026-001:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;All actions must stay &lt;strong&gt;inside this entity context&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Sticky Routing does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Binds the session to a specific &lt;strong&gt;entity instance&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Prevents jumping to unrelated flows&lt;/li&gt;
&lt;li&gt;Ensures the agent continues within the same &lt;strong&gt;state machine&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not “conversation memory.”&lt;br&gt;
This is &lt;strong&gt;context locking&lt;/strong&gt;.&lt;/p&gt;


&lt;h3&gt;
  
  
  📝 Pattern 2: &lt;strong&gt;Draft–Commit Protocol (The Editor)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This is the most important shift.&lt;/p&gt;
&lt;h4&gt;
  
  
  ❌ Traditional Agent Model
&lt;/h4&gt;

&lt;p&gt;User says something → Agent calls tool → Data saved immediately.&lt;/p&gt;

&lt;p&gt;That’s like autosaving every typo directly to production.&lt;/p&gt;
&lt;h4&gt;
  
  
  ✅ ASA Model
&lt;/h4&gt;

&lt;p&gt;We introduce a &lt;strong&gt;transaction buffer&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input → Draft State → Review → Modify → Commit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Draft is &lt;strong&gt;mutable&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Commit is &lt;strong&gt;atomic&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Nothing touches the real system until commit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This mirrors &lt;strong&gt;ACID transaction thinking&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Database Concept&lt;/th&gt;
&lt;th&gt;ASA Equivalent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Transaction&lt;/td&gt;
&lt;td&gt;Workflow session&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Buffer&lt;/td&gt;
&lt;td&gt;Draft state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Commit&lt;/td&gt;
&lt;td&gt;Final confirmation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rollback&lt;/td&gt;
&lt;td&gt;Discard draft&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We are no longer “talking to a system.”&lt;br&gt;
We are &lt;strong&gt;editing a transaction&lt;/strong&gt;.&lt;/p&gt;


&lt;h3&gt;
  
  
  ⏳ Pattern 3: &lt;strong&gt;Recall &amp;amp; Hydrate (The Time Machine)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This is where ASA breaks away from RAG-based thinking.&lt;/p&gt;
&lt;h4&gt;
  
  
  ❌ How Most Systems Treat the Past
&lt;/h4&gt;

&lt;p&gt;Past = text.&lt;/p&gt;

&lt;p&gt;You retrieve:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“User created a PO for laptops.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But this is just dead text.&lt;/p&gt;

&lt;p&gt;You cannot &lt;strong&gt;continue editing&lt;/strong&gt; that PO.&lt;/p&gt;
&lt;h4&gt;
  
  
  ✅ ASA View
&lt;/h4&gt;

&lt;p&gt;Past workflows are &lt;strong&gt;serialized entities&lt;/strong&gt;, not paragraphs.&lt;/p&gt;

&lt;p&gt;When recalled, they are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Reconstructed → Hydrated → Returned to Draft Mode
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hydration means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Restore entity structure&lt;/li&gt;
&lt;li&gt;Restore state machine position&lt;/li&gt;
&lt;li&gt;Restore editable fields&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don’t just “read history.”&lt;/p&gt;

&lt;p&gt;You &lt;strong&gt;re-enter a live workflow instance&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is closer to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Reopening a saved document&lt;br&gt;
than retrieving a document summary&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;4. Mindset Shift: AI Should Not Invent Workflows&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Here’s the hard truth:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Free-form AI is bad at governance.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We must stop asking AI to “figure out the process.”&lt;/p&gt;

&lt;p&gt;Instead:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Design deterministic workflows. Let AI flex inside them.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Or simply:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Don’t use AI to invent the process.&lt;br&gt;
Use AI to flexibly fill the process.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the marriage of:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Software Engineering&lt;/th&gt;
&lt;th&gt;Generative AI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Structure&lt;/td&gt;
&lt;td&gt;Flexibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State Machines&lt;/td&gt;
&lt;td&gt;Language Understanding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic flows&lt;/td&gt;
&lt;td&gt;Adaptive input&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transactions&lt;/td&gt;
&lt;td&gt;Draft reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ASA vs Traditional Agent Architectures&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Fire-and-Forget Agent&lt;/th&gt;
&lt;th&gt;Atomic Stateful Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;State&lt;/td&gt;
&lt;td&gt;Conversation history&lt;/td&gt;
&lt;td&gt;Explicit entity state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Handling&lt;/td&gt;
&lt;td&gt;Immediate tool writes&lt;/td&gt;
&lt;td&gt;Draft → Commit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workflow Model&lt;/td&gt;
&lt;td&gt;Linear chat&lt;/td&gt;
&lt;td&gt;State machine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Past Sessions&lt;/td&gt;
&lt;td&gt;Text memory (RAG)&lt;/td&gt;
&lt;td&gt;Hydratable entities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Routing&lt;/td&gt;
&lt;td&gt;Message-based&lt;/td&gt;
&lt;td&gt;Context-locked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Determinism&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise Safety&lt;/td&gt;
&lt;td&gt;Fragile&lt;/td&gt;
&lt;td&gt;Structured&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Why This Matters Now&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;As AI moves from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“cool assistant” → “system operator”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We need architectures that treat:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;State as first-class&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Entities as durable objects&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI as a collaborator inside constraints&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ASA is not about making agents smarter.&lt;/p&gt;

&lt;p&gt;It’s about making them &lt;strong&gt;safe to plug into real systems&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;P/S: This article is the implementation of this architecture: &lt;a href="https://dev.to/chnghia/atomic-stateful-agent-from-architecture-idea-to-working-code-1ljh"&gt;Atomic Stateful Agent&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>agents</category>
    </item>
    <item>
      <title>Decoupling the AI Stack: How to Architect a Production-Grade Local LLM System</title>
      <dc:creator>nghiach</dc:creator>
      <pubDate>Thu, 22 Jan 2026 03:44:17 +0000</pubDate>
      <link>https://forem.com/chnghia/decoupling-the-ai-stack-how-to-architect-a-production-grade-local-llm-system-1a0c</link>
      <guid>https://forem.com/chnghia/decoupling-the-ai-stack-how-to-architect-a-production-grade-local-llm-system-1a0c</guid>
      <description>&lt;p&gt;&lt;em&gt;From "Localhost" to "On-Premise": An open-source blueprint for building a privacy-first, scalable AI infrastructure with vLLM and LiteLLM.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We are currently living in the "Golden Age" of Local AI. Tools like Ollama and LM Studio have democratized access to Large Language Models (LLMs), allowing any developer to spin up a 7B parameter model on their laptop in minutes.&lt;/p&gt;

&lt;p&gt;However, a significant gap remains in the ecosystem. While these tools are fantastic for &lt;strong&gt;single-user experimentation&lt;/strong&gt;, they often encounter bottlenecks when promoted to a &lt;strong&gt;shared, enterprise environment&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When you try to move from a "Hobbyist" setup to a "Production" on-premise infrastructure for your team, you face different challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency:&lt;/strong&gt; How do you serve multiple concurrent users without queuing requests indefinitely?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decoupling:&lt;/strong&gt; How do you swap models (e.g., Llama 3 to Qwen 2.5) without breaking client applications?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance:&lt;/strong&gt; How do you manage API keys, log usage, and enforce budget limits?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article explores an architectural approach to solving these problems by &lt;strong&gt;decoupling the AI stack&lt;/strong&gt;. I will also introduce &lt;strong&gt;SOLV Stack&lt;/strong&gt;, an open-source reference implementation I built to demonstrate this architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architectural Shift: Decoupling Components
&lt;/h2&gt;

&lt;p&gt;In traditional web development, we wouldn't connect our frontend directly to our database. We use API Gateways and Backend services. We need to apply the same rigor to AI Infrastructure.&lt;/p&gt;

&lt;p&gt;A production-grade Local AI system should be composed of three distinct, loosely coupled layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Presentation Layer (UI):&lt;/strong&gt; Where users interact (Chat interface).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Governance Layer (Gateway):&lt;/strong&gt; Where routing, logging, and auth happen.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Inference Layer (Compute):&lt;/strong&gt; Where the raw model processing occurs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By separating these concerns, we avoid vendor lock-in and ensure scalability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reference Architecture (SOLV)
&lt;/h2&gt;

&lt;p&gt;To implement this philosophy practically, I created a Dockerized boilerplate called &lt;strong&gt;SOLV Stack&lt;/strong&gt;. It stands for the four core components selected for their performance and enterprise readiness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;S&lt;/strong&gt;earXNG (Privacy-focused Search)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;O&lt;/strong&gt;penWebUI (The Interface)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L&lt;/strong&gt;iteLLM (The Gateway)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;V&lt;/strong&gt;LLM (The Inference Engine)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is how data flows through the system:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07t9i29x9kko3hod1ifq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07t9i29x9kko3hod1ifq.png" alt=" " width="800" height="908"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Inference Layer: Why vLLM?
&lt;/h3&gt;

&lt;p&gt;For local development, tools like Ollama (based on llama.cpp) are excellent. However, for a shared infrastructure, throughput is king.&lt;/p&gt;

&lt;p&gt;I chose &lt;strong&gt;vLLM&lt;/strong&gt; for this stack because of its &lt;strong&gt;PagedAttention&lt;/strong&gt; technology. In a multi-user scenario, vLLM manages GPU memory much more efficiently than standard loaders, allowing for higher continuous batching. It is designed to be a server first, maximizing the utilization of your expensive GPUs (like the RTX 5090).&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Gateway Layer: The Power of LiteLLM
&lt;/h3&gt;

&lt;p&gt;This is perhaps the most critical component for an enterprise architecture. &lt;strong&gt;LiteLLM&lt;/strong&gt; acts as a universal proxy.&lt;/p&gt;

&lt;p&gt;It normalizes all inputs to the OpenAI standard format. This means your client applications (whether it's OpenWebUI, a custom React app, or an IDE plugin like Continue) only need to know how to speak "OpenAI." They don't need to know if the backend is running vLLM, Azure, or Anthropic.&lt;/p&gt;

&lt;p&gt;This enables a &lt;strong&gt;Hybrid Architecture&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Routine tasks:&lt;/strong&gt; Route to local vLLM (Zero cost, 100% privacy).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex reasoning:&lt;/strong&gt; Route to GPT-4 (Pay per token).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This logic is handled strictly at the config level, not in your application code.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The Interface: OpenWebUI
&lt;/h3&gt;

&lt;p&gt;Currently, &lt;strong&gt;OpenWebUI&lt;/strong&gt; offers the most comprehensive feature set for teams, including RAG (Retrieval Augmented Generation) pipelines, user role management, and chat history. Because our stack is decoupled, if a better UI comes out next year (e.g., LibreChat), you can swap this layer without touching your backend models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation: The SOLV-Stack Boilerplate
&lt;/h2&gt;

&lt;p&gt;I have packaged this entire architecture into a &lt;code&gt;docker-compose&lt;/code&gt; setup that supports &lt;strong&gt;NVIDIA GPUs&lt;/strong&gt; on both Linux and &lt;strong&gt;Windows (WSL2)&lt;/strong&gt;—a crucial feature for organizations where developers work on Windows machines.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration Example
&lt;/h3&gt;

&lt;p&gt;The magic happens in the &lt;code&gt;litellm_config.yaml&lt;/code&gt;. Here, we map our internal vLLM instance to a user-facing model name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;model_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# The client sees "gpt-4-local"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;model_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-4-local&lt;/span&gt;
    &lt;span class="na"&gt;litellm_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# But we route it to our local Qwen 2.5 instance&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai/qwen2.5-coder&lt;/span&gt;
      &lt;span class="na"&gt;api_base&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://vllm-backend:8000/v1&lt;/span&gt;
      &lt;span class="na"&gt;api_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;EMPTY&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Real-World Use Case: The Private Coding Assistant
&lt;/h3&gt;

&lt;p&gt;One of the most immediate benefits of this stack is enabling AI coding assistants for your team without sending code to the cloud.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy SOLV Stack on a local server with an RTX 5090.&lt;/li&gt;
&lt;li&gt;Developers install the &lt;strong&gt;Continue&lt;/strong&gt; or &lt;strong&gt;Cline&lt;/strong&gt; extension in VS Code.&lt;/li&gt;
&lt;li&gt;Point the extension to &lt;code&gt;http://your-server:8080/llm/v1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Result: A Copilot-like experience that runs entirely within your firewall.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building a local AI platform is not just about downloading model weights; it's about designing a system that is stable, observable, and adaptable.&lt;/p&gt;

&lt;p&gt;By moving from a monolithic "localhost" tool to a decoupled architecture using vLLM and LiteLLM, you gain control over your data and your infrastructure.&lt;/p&gt;

&lt;p&gt;If you want to try this architecture yourself, I've open-sourced the setup here. It includes scripts for model downloading, Nginx configuration, and RAG pipelines setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/chnghia/solv-stack" rel="noopener noreferrer"&gt;github.com/chnghia/solv-stack&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I'd love to hear how you are architecting your local AI stack. Are you using a Gateway pattern? Let me know in the comments!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Accelerating AI Inference Workflows with the Atomic Inference Boilerplate</title>
      <dc:creator>nghiach</dc:creator>
      <pubDate>Mon, 19 Jan 2026 07:55:57 +0000</pubDate>
      <link>https://forem.com/chnghia/accelerating-ai-inference-workflows-with-the-atomic-inference-boilerplate-75b</link>
      <guid>https://forem.com/chnghia/accelerating-ai-inference-workflows-with-the-atomic-inference-boilerplate-75b</guid>
      <description>&lt;p&gt;&lt;em&gt;An opinionated foundation for reliable, composable LLM inference&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Large language model (LLM) applications grow complex fast. Prompt logic, schema validation, multi-provider setups, and execution patterns become scattered. What if you could standardize &lt;em&gt;how&lt;/em&gt; individual inference steps are written, validated, and executed — leaving orchestration, pipelines, and workflows to higher-level layers?&lt;/p&gt;

&lt;p&gt;That’s the problem the &lt;strong&gt;atomic-inference-boilerplate&lt;/strong&gt; aims to solve: provide a &lt;strong&gt;production-ready foundation&lt;/strong&gt; for building robust inference units that are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Atomic&lt;/strong&gt;: Each unit performs one focused step — rendering a prompt, calling an LLM, validating structured output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composable&lt;/strong&gt;: Easily integrated into larger workflows such as LangGraph, Prefect, or custom orchestration layers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type-safe&lt;/strong&gt;: Outputs are never raw strings; results conform strictly to Pydantic schemas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider-agnostic&lt;/strong&gt;: Works with OpenAI, Anthropic, Ollama, LM Studio via LiteLLM routing — switch models without rewriting logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s unpack what this boilerplate brings to your AI toolkit.&lt;/p&gt;




&lt;h3&gt;
  
  
  🧱 &lt;strong&gt;Project Philosophy: Atomic Execution Units&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;At the heart is a simple but powerful design principle:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“Complex reasoning should be broken down into atomic units — single, focused inference steps.”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;An &lt;em&gt;Atomic Unit&lt;/em&gt; encapsulates:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A Prompt Template (Jinja2)&lt;/strong&gt; – separates text generation templates from business logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Schema (Pydantic)&lt;/strong&gt; – defines strong typing expectations on outputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Runner (LiteLLM + Instructor)&lt;/strong&gt; – resolves the model provider, generates completions, and validates output&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This structure ensures your inference logic is &lt;strong&gt;modular, testable, and predictable&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  📂 &lt;strong&gt;Repository Structure&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Here’s how the repo’s main components are organized:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/
├── core/           # Boilerplate core classes (AtomicUnit, renderer, client)
├── modules/        # Shared utilities (vector store helpers, validation utils)
├── prompts/        # Jinja2 prompt template files
└── schemas/        # Pydantic schema definitions
examples/           # Usage samples (basic, LangGraph, Prefect pipelines)
tests/              # Unit and integration tests
docs/ specs/        # Extended specifications and docs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The core, prompts, and schemas folders embody the atomic execution pattern. The &lt;code&gt;examples/&lt;/code&gt; folder contains concrete patterns you can use in real projects — from basic extraction tasks to multi-agent LangGraph configurations.&lt;/p&gt;




&lt;h3&gt;
  
  
  ⚙️ &lt;strong&gt;Getting Started (Quickstart)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Clone the repo and install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone &amp;lt;repo-url&amp;gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;atomic-inference-boilerplate
conda activate atomic      &lt;span class="c"&gt;# or your Python env&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env       &lt;span class="c"&gt;# configure API keys&lt;/span&gt;
python examples/basic.py   &lt;span class="c"&gt;# run a basic example&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This bootstraps the boilerplate and executes a simple inference unit from the &lt;code&gt;examples/&lt;/code&gt; directory.&lt;/p&gt;




&lt;h3&gt;
  
  
  🧪 &lt;strong&gt;Example: Define &amp;amp; Run an Inference Unit&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Each atomic unit is defined with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a &lt;strong&gt;template&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;an &lt;strong&gt;output schema&lt;/strong&gt;, and&lt;/li&gt;
&lt;li&gt;optional &lt;strong&gt;model choice&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;src.core&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AtomicUnit&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ExtractedEntity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;entity_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="n"&gt;extractor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AtomicUnit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;template_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extraction.j2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ExtractedEntity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;extractor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Apple Inc. is a technology company.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ExtractedEntity(name='Apple Inc.', entity_type='company')
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the unit receives a text prompt, formats the Jinja2 template, executes the LLM call via LiteLLM, and validates the structured output against the &lt;code&gt;ExtractedEntity&lt;/code&gt; schema. No loose strings — everything is typed and predictable.&lt;/p&gt;




&lt;h3&gt;
  
  
  🤖 &lt;strong&gt;Scaling to Real Workflows&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Rather than replacing a workflow or orchestration framework, this boilerplate &lt;strong&gt;plugs into them&lt;/strong&gt;. For instance:&lt;/p&gt;

&lt;h4&gt;
  
  
  📌 LangGraph Integration
&lt;/h4&gt;

&lt;p&gt;Examples like &lt;code&gt;langgraph_single_agent.py&lt;/code&gt; and &lt;code&gt;langgraph_multi_agent.py&lt;/code&gt; demonstrate how atomic units become the &lt;em&gt;execution layer&lt;/em&gt; behind orchestration decisions made by LangGraph. Higher layers decide &lt;em&gt;what&lt;/em&gt; to do next, while atomic units decide &lt;em&gt;how&lt;/em&gt; to perform each inference step.&lt;/p&gt;

&lt;h4&gt;
  
  
  📌 Prefect Pipelines
&lt;/h4&gt;

&lt;p&gt;In extract-transform-load style pipelines (e.g., document processing), atomic units can extract metadata, detect structure, and chunk content — each step isolated, typed, and testable.&lt;/p&gt;

&lt;p&gt;This separation of concerns improves maintainability and accelerates development. Instead of ad-hoc prompts scattered across your codebase, you get a clear, reusable pattern for every LLM interaction.&lt;/p&gt;




&lt;h3&gt;
  
  
  🧠 &lt;strong&gt;Why Atomic Inference Matters&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In modern LLM applications, teams rapidly face challenges like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt logic tangled with business logic&lt;/li&gt;
&lt;li&gt;Dirty text outputs requiring fragile parsing&lt;/li&gt;
&lt;li&gt;Changing LLM providers or models&lt;/li&gt;
&lt;li&gt;Hard-to-test inference steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The atomic-inference-boilerplate tackles these by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;enforcing &lt;em&gt;template + schema separation&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;imbuing &lt;em&gt;type safety by design&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;enabling &lt;em&gt;provider abstraction&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;fostering &lt;em&gt;modularity and reuse&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach mirrors best practices seen in software architecture (like atomic design in UI or modular microservices), but applied to the &lt;em&gt;inference layer&lt;/em&gt; of AI systems.&lt;/p&gt;




&lt;h3&gt;
  
  
  🏁 &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you’re building AI applications with anything beyond throwaway prototypes — where inference must be reliable, validated, maintainable, and scalable — then structuring your inference logic matters.&lt;/p&gt;

&lt;p&gt;This boilerplate is a strong candidate for the core execution layer of your LLM pipelines. Whether you embed it inside workflow frameworks like Prefect, orchestrators like LangGraph, or custom pipelines, you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;predictable and testable inference steps&lt;/li&gt;
&lt;li&gt;clear separation between prompting and logic&lt;/li&gt;
&lt;li&gt;extensibility to multiple providers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Give it a try and share your patterns on &lt;em&gt;dev.to&lt;/em&gt;! Let’s build better AI workflows.&lt;/p&gt;

&lt;p&gt;My repo:&lt;br&gt;
&lt;a href="https://github.com/chnghia/atomic-inference-boilerplate" rel="noopener noreferrer"&gt;https://github.com/chnghia/atomic-inference-boilerplate&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>showdev</category>
      <category>tooling</category>
    </item>
    <item>
      <title>I Built an Agentic AI Boilerplate (Agent-First, Conversation-First)</title>
      <dc:creator>nghiach</dc:creator>
      <pubDate>Sun, 28 Dec 2025 04:16:10 +0000</pubDate>
      <link>https://forem.com/chnghia/i-built-an-agentic-ai-boilerplate-agent-first-conversation-first-4ngf</link>
      <guid>https://forem.com/chnghia/i-built-an-agentic-ai-boilerplate-agent-first-conversation-first-4ngf</guid>
      <description>&lt;p&gt;Most current AI applications treat AI prompts as a simple plugin. You have a form, a prompt, and a linear workflow to generate text or assets.&lt;/p&gt;

&lt;p&gt;However, I believe this is not enough. The future of software is not just about generating content. It is about Agents that can think, plan, and coordinate tasks. I built this boilerplate to explore that future.&lt;/p&gt;

&lt;h3&gt;
  
  
  What do I mean by “Agentic”?
&lt;/h3&gt;

&lt;p&gt;There is a lot of buzz around the word "Agent." Let me clarify my definition.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;It is not just a chatbot: A chatbot just replies to text.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It is not just a workflow: A workflow is a static path (A to B to C).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To me, a true Agent must be "alive." It needs state to know its current status. It needs memory to remember context. It must make decisions on its own. Finally, it must have the ability to call other agents to help solve complex problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core philosophy of this boilerplate
&lt;/h3&gt;

&lt;p&gt;I built this system based on three main principles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Agent-first, not feature-first&lt;br&gt;
Usually, we build features (like an "Export to PDF" button). In this boilerplate, we build an Agent that knows how to export a PDF. The agent is the core capability, not the UI buttons.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Conversation-first, no dashboards&lt;br&gt;
Complex internal tools often have messy dashboards. I believe the best interface is a conversation. You talk to the system, and the system acts. The UI should focus on the chat, not on static tables and charts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Explicit orchestration (state machine over magic)&lt;br&gt;
I do not like "magic" loops that run forever without control. I prefer explicit orchestration. I use state machines to define exactly what the agent can and cannot do. This makes the system predictable and easy to debug.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  High-level architecture
&lt;/h3&gt;

&lt;p&gt;I wanted a stack that is modern, fast, and scalable. Here is the architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;FastAPI: This serves as the API and the runtime environment for the backend.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LangGraph: This is the brain. I use it to orchestrate the agent's logic and manage the state.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SSE (Server-Sent Events): We don't use simple request/response. The backend pushes events to the frontend in real-time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Frontend as a “Viewer”: It acts as the communication interface. It focuses on interacting with the user through messages, prompt cards (Generative UI), tool calls, and errors.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What this repo gives you (and what it doesn’t)
&lt;/h3&gt;

&lt;p&gt;What it gives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clean skeleton: A solid structure for an agentic system.&lt;/li&gt;
&lt;li&gt;Orchestrator pattern: A clear way to manage how agents talk to each other.&lt;/li&gt;
&lt;li&gt;Structured design: A specific place to add your sub-agents and tools.&lt;/li&gt;
&lt;li&gt;Event-driven execution: A full model for handling real-time events.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What it doesn’t:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No UI framework: It provides a basic viewer, not a UI component library.&lt;/li&gt;
&lt;li&gt;No prompt magic: You still need to write good prompts.&lt;/li&gt;
&lt;li&gt;No SaaS features: It does not include billing, user management, or subscription logic.&lt;/li&gt;
&lt;li&gt;Not a chatbot starter: If you just want a "Hello World" bot, this is overkill.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Who this boilerplate is for
&lt;/h3&gt;

&lt;p&gt;This is for you if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are building a Personal Agent, an internal AI tool, or Enterprise AI.&lt;/li&gt;
&lt;li&gt;You want full control over your agent's logic and flow.&lt;/li&gt;
&lt;li&gt;You hate "prompt spaghetti" (messy code mixed with prompts).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is NOT for you if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are a low-code user.&lt;/li&gt;
&lt;li&gt;You just want a standard chatbot running in 5 minutes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How I’m using it (or planning to)
&lt;/h3&gt;

&lt;p&gt;I am using this boilerplate as the foundation for my Personal Agentic Hub. My goals are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Journaling / Bookmarking Agent: An agent that organizes my notes and links automatically.&lt;/li&gt;
&lt;li&gt;Long-running Agents: Agents that can keep memory for days or weeks to help me track long-term projects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Repo &amp;amp; next steps&lt;br&gt;
The code is open-source and available here: 👉 &lt;a href="https://github.com/chnghia/agentic-boilerplate" rel="noopener noreferrer"&gt;https://github.com/chnghia/agentic-boilerplate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My Roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adding support for Sub-agents.&lt;/li&gt;
&lt;li&gt;Implementing long-term Memory.&lt;/li&gt;
&lt;li&gt;Optimizing for Local-first models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I would love to hear your thoughts. Feel free to leave feedback, start a discussion, or open a Pull Request!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agentic</category>
      <category>langgraph</category>
      <category>backend</category>
    </item>
  </channel>
</rss>
