<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ben</title>
    <description>The latest articles on Forem by Ben (@benturtle).</description>
    <link>https://forem.com/benturtle</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3870610%2Fe9de12b0-d258-4335-b601-e48dc66f25fb.png</url>
      <title>Forem: Ben</title>
      <link>https://forem.com/benturtle</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/benturtle"/>
    <language>en</language>
    <item>
      <title>Building a Multi-Agent Research System with LangGraph: How I Taught Three AI Agents to Collaborate</title>
      <dc:creator>Ben</dc:creator>
      <pubDate>Thu, 09 Apr 2026 22:07:42 +0000</pubDate>
      <link>https://forem.com/benturtle/building-a-multi-agent-research-system-with-langgraph-how-i-taught-three-ai-agents-to-collaborate-2ng6</link>
      <guid>https://forem.com/benturtle/building-a-multi-agent-research-system-with-langgraph-how-i-taught-three-ai-agents-to-collaborate-2ng6</guid>
      <description>&lt;p&gt;Last year I was building AI features for a high-traffic editorial platform when I ran into a problem that kept showing up: users needed answers that lived in &lt;em&gt;two places at once&lt;/em&gt;. Half the context was buried in internal documents — contracts, policies, style guides — and the other half was out on the open web, changing by the hour. No single retrieval strategy could handle both.&lt;/p&gt;

&lt;p&gt;So I built a system where multiple AI agents collaborate to figure out where to look, go find the information, and synthesize it into a single coherent answer. The result is &lt;a href="https://github.com/BenoitGaudieri/multi-agent-researcher" rel="noopener noreferrer"&gt;multi-agent-researcher&lt;/a&gt;, an open-source CLI tool powered by LangGraph, FAISS, and Ollama.&lt;/p&gt;

&lt;p&gt;This post walks through the architecture decisions, the trade-offs I encountered, and the patterns I'd reuse in any multi-agent system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: One Question, Multiple Knowledge Sources
&lt;/h2&gt;

&lt;p&gt;Consider these two questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"What does article 3 of the contract say about termination?"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"What are the latest regulations on remote work in Italy?"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first one lives in a local document. The second one requires a web search. Simple enough — but what about this one?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"What does our company policy say about remote work, and how does it compare to current labor laws?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That question needs &lt;em&gt;both&lt;/em&gt;. And the system should figure that out on its own, without the user specifying which source to query.&lt;/p&gt;

&lt;p&gt;This is the core design challenge: &lt;strong&gt;intelligent routing&lt;/strong&gt;. The system needs to classify the intent behind a question and dispatch it to the right agents — before any retrieval happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture: A Graph, Not a Pipeline
&lt;/h2&gt;

&lt;p&gt;My first instinct was a simple if/else chain: check for keywords, route accordingly. That broke down fast. Questions are ambiguous. "What's the policy on X?" could mean an internal document &lt;em&gt;or&lt;/em&gt; a government regulation, depending on context.&lt;/p&gt;

&lt;p&gt;Instead, I modeled the system as a directed graph using &lt;a href="https://github.com/langchain-ai/langgraph" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt;, where each node is an autonomous agent with a single responsibility:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user question
      │
      ▼
 ┌────────────┐
 │ Orchestrate│  ← LLM classifies the question
 └─────┬──────┘
       │
       │ conditional fan-out
       ├─────────────────────────┐
       ▼                         ▼
 ┌───────────┐           ┌──────────────┐
 │ RAG Agent │           │  Web Agent   │
 │  (FAISS)  │           │(Tavily/DDG)  │
 └─────┬─────┘           └──────┬───────┘
       │                         │
       └────────────┬────────────┘
                    ▼
             ┌────────────┐
             │ Synthesize │  ← combines all context
             └────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four nodes. One conditional edge. That's the entire system. LangGraph handles the rest — parallel execution, state synchronization, fan-in after fan-out.&lt;/p&gt;

&lt;p&gt;Why a graph instead of a chain? Because chains are sequential by definition. When you need both RAG and web search, a chain forces you to pick an order. A graph lets both agents run simultaneously and converge when they're done.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Orchestrator: Teaching an LLM to Route
&lt;/h2&gt;

&lt;p&gt;The orchestrator is the brain of the system. It receives the user's question and classifies it into one of four routing strategies:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Route&lt;/th&gt;
&lt;th&gt;When&lt;/th&gt;
&lt;th&gt;Agents activated&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Question targets private/indexed documents&lt;/td&gt;
&lt;td&gt;RAG only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WEB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Question needs fresh or general information&lt;/td&gt;
&lt;td&gt;Web only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BOTH&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Question spans local and external knowledge&lt;/td&gt;
&lt;td&gt;RAG + Web in parallel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NONE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Trivial or conversational — no retrieval needed&lt;/td&gt;
&lt;td&gt;Straight to synthesis&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The classification prompt is deliberately minimal. I ask the LLM to return exactly one word. No JSON parsing, no structured output format, no schema validation — just a single token that maps directly to a routing decision. This keeps latency low and reduces failure modes.&lt;/p&gt;

&lt;p&gt;The critical insight here is that the orchestrator doesn't need to be &lt;em&gt;right&lt;/em&gt; every time — it needs to be &lt;em&gt;right enough&lt;/em&gt;. If it routes a RAG question to BOTH, the web results just get ignored during synthesis. If it routes a web question to RAG, the empty results trigger a graceful fallback. The system is designed to be robust to misclassification.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shared State: The Glue Between Agents
&lt;/h2&gt;

&lt;p&gt;LangGraph uses a &lt;code&gt;TypedDict&lt;/code&gt; as shared state that flows through the graph. Here's what mine looks like:&lt;/p&gt;

&lt;p&gt;python&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;needs_rag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;
    &lt;span class="n"&gt;needs_web&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;
    &lt;span class="n"&gt;rag_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;web_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;output_mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;final_answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Annotated[list[str], add]&lt;/code&gt; pattern is doing heavy lifting here. That &lt;code&gt;add&lt;/code&gt; reducer tells LangGraph: &lt;em&gt;when multiple nodes write to this field concurrently, concatenate the lists instead of overwriting&lt;/em&gt;. Without it, parallel fan-out would be a race condition — whichever agent finishes last would clobber the other's results.&lt;/p&gt;

&lt;p&gt;This is one of those details that seems minor but is actually the foundation of the whole parallel execution model. Getting state management wrong in multi-agent systems is the number-one source of subtle bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The RAG Agent: Local Knowledge via FAISS
&lt;/h2&gt;

&lt;p&gt;The RAG agent handles document retrieval using a FAISS vector store with embeddings generated locally via Ollama (no external API calls for embeddings — everything stays private).&lt;/p&gt;

&lt;p&gt;The indexing pipeline supports PDF, Markdown, plain text, and DOCX files. Documents get chunked with configurable size and overlap (defaults: 1000 characters, 200 overlap), embedded, and stored in a named collection. The collection system lets you segment different knowledge domains — one for contracts, another for policies, a third for archived web results.&lt;/p&gt;

&lt;p&gt;That last part is worth highlighting: the system can re-index its own web search results into the RAG store. This means frequently-asked questions about external topics gradually become locally cached knowledge. It's a simple feedback loop, but it dramatically reduces redundant web searches over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Web Agent: Dual-Source with Graceful Fallback
&lt;/h2&gt;

&lt;p&gt;Web search uses Tavily as the primary source and DuckDuckGo as a zero-config fallback. If Tavily's API key isn't set, the system silently switches to DuckDuckGo — no configuration required, no error messages, it just works.&lt;/p&gt;

&lt;p&gt;Raw web results are saved as timestamped Markdown files before being passed to the synthesizer. This serves two purposes: auditability (you can trace every answer back to its sources) and the re-indexing loop mentioned above.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Synthesizer: From Fragments to Answers
&lt;/h2&gt;

&lt;p&gt;The synthesize node receives all accumulated context — RAG chunks, web results, or both — and generates a final answer. The prompt instructs the LLM to weave multiple sources into a coherent response rather than just listing them.&lt;/p&gt;

&lt;p&gt;This is where having a well-structured state pays off. The synthesizer doesn't need to know &lt;em&gt;how&lt;/em&gt; the context was retrieved. It just sees an array of text chunks and composes a response. This decoupling means adding a new agent (database, API, whatever) requires zero changes to the synthesis logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conditional Fan-Out: The Routing Function
&lt;/h2&gt;

&lt;p&gt;The routing logic is a single function that returns a list of destination nodes:&lt;/p&gt;

&lt;p&gt;python&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;destinations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;needs_rag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;destinations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rag_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;needs_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;destinations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;destinations&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Returning a list triggers LangGraph's parallel fan-out. Both agents execute concurrently, write their results to the shared state (using the &lt;code&gt;add&lt;/code&gt; reducer), and LangGraph automatically waits for all branches to complete before executing &lt;code&gt;synthesize&lt;/code&gt;. No manual synchronization, no callbacks, no promises to resolve.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;or ["synthesize"]&lt;/code&gt; fallback handles the NONE route — when the orchestrator decides no retrieval is needed, the question goes straight to synthesis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Locally: No API Keys Required
&lt;/h2&gt;

&lt;p&gt;A deliberate design choice: the entire system runs locally using Ollama. No OpenAI key, no cloud dependencies, no data leaving your machine. This matters for enterprise use cases where documents contain sensitive information.&lt;/p&gt;

&lt;p&gt;bash&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull llama3.2          &lt;span class="c"&gt;# reasoning&lt;/span&gt;
ollama pull nomic-embed-text  &lt;span class="c"&gt;# embeddings&lt;/span&gt;
python main.py index ./docs/
python main.py research &lt;span class="s2"&gt;"What does the contract say about termination?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tavily is optional (adds higher-quality web search), but the system works fine with just DuckDuckGo out of the box.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Better evaluation.&lt;/strong&gt; I built this system iteratively, testing routing accuracy by feel. In production, I'd want a labeled dataset of questions with expected routes, and automated evaluation of retrieval quality (recall, precision, relevance scores).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming responses.&lt;/strong&gt; The current version waits for complete synthesis before outputting anything. For longer answers, streaming the synthesizer's output token-by-token would dramatically improve perceived performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent memory.&lt;/strong&gt; Right now each question is independent. Adding conversation memory would let the system handle follow-ups: "Tell me more about article 3" after a previous question about the contract.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic chunk sizing.&lt;/strong&gt; Fixed-size chunking works, but it's a blunt instrument. For structured documents (contracts with numbered clauses, policies with sections), semantic chunking based on document structure would improve retrieval precision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patterns Worth Stealing
&lt;/h2&gt;

&lt;p&gt;If you're building multi-agent systems, here are the patterns from this project that I think generalize well:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;LLM-as-router.&lt;/strong&gt; Use a lightweight LLM call to classify intent before doing any expensive retrieval. It's faster and more flexible than rule-based routing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reducer-based state.&lt;/strong&gt; LangGraph's &lt;code&gt;Annotated&lt;/code&gt; reducers solve parallel state conflicts elegantly. Define how concurrent writes merge, and forget about synchronization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graceful degradation.&lt;/strong&gt; Every external dependency has a fallback. Tavily fails → DuckDuckGo. No indexed documents → skip RAG. The system should always return &lt;em&gt;something&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-feeding loops.&lt;/strong&gt; Save intermediate results (web searches, synthesized answers) in a format that can be re-indexed. Your system gets smarter over time without any additional training.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single-responsibility nodes.&lt;/strong&gt; Each agent does one thing. This makes the system trivially extensible — adding a database agent is three lines of code plus the query logic.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;The full source code is on &lt;a href="https://github.com/BenoitGaudieri/multi-agent-researcher" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Questions, feedback, or ideas for new agents? I'd love to hear about them.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>langchain</category>
      <category>python</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
