<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Zohar Babin</title>
    <description>The latest articles on Forem by Zohar Babin (@zoharbabin).</description>
    <link>https://forem.com/zoharbabin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3936734%2F51a9e2c8-9c93-45ba-91ae-2449d592c478.png</url>
      <title>Forem: Zohar Babin</title>
      <link>https://forem.com/zoharbabin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/zoharbabin"/>
    <language>en</language>
    <item>
      <title>From Node.js to Go: Rebuilding an MCP Server for Production</title>
      <dc:creator>Zohar Babin</dc:creator>
      <pubDate>Tue, 19 May 2026 20:48:20 +0000</pubDate>
      <link>https://forem.com/zoharbabin/from-nodejs-to-go-rebuilding-an-mcp-server-for-production-oil</link>
      <guid>https://forem.com/zoharbabin/from-nodejs-to-go-rebuilding-an-mcp-server-for-production-oil</guid>
      <description>&lt;p&gt;This is the story of why I rebuilt &lt;a href="https://github.com/zoharbabin/google-researcher-mcp" rel="noopener noreferrer"&gt;google-researcher-mcp&lt;/a&gt; (Node.js/TypeScript) from scratch as &lt;a href="https://github.com/zoharbabin/web-researcher-mcp" rel="noopener noreferrer"&gt;web-researcher-mcp&lt;/a&gt; (Go), and what the lessons learned along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Starting Point
&lt;/h2&gt;

&lt;p&gt;The original project — &lt;code&gt;google-researcher-mcp&lt;/code&gt; — was a TypeScript/Node.js MCP server distributed via npm. It had real traction: 36 GitHub stars, 6,500+ npm downloads, 860+ tests, and active users. But five critical issues kept surfacing that couldn't be solved within the existing architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Rewrite in Go (Not Refactored)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Orphan Processes (Issue #108)
&lt;/h3&gt;

&lt;p&gt;npx spawns deeply nested process trees. When the parent MCP client (Claude Desktop, Cursor) crashes or closes unexpectedly, the Node.js process doesn't receive a signal — it keeps running, consuming memory and holding file locks.&lt;/p&gt;

&lt;p&gt;Myself and collaborators spent three versions (v6.2.0 through v6.4.0) building increasingly complex orphan detection: a Worker thread watchdog with CPU spin detection, three-layer parent-alive checks, and graceful degradation. It was all band-aids on a fundamental runtime limitation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go fix&lt;/strong&gt;: A single static binary. No runtime process tree. EOF on stdin = immediate exit. The entire problem category disappeared.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google Discontinuing "Entire Web" Search (Issue #107)
&lt;/h3&gt;

&lt;p&gt;Google announced it would be discontinuing support for Programmable Search Engines configured to search the "entire web." The project was named &lt;code&gt;google-researcher-mcp&lt;/code&gt; — the dependency on a single search provider was an foundational risk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go fix&lt;/strong&gt;: Interface-driven &lt;code&gt;search.Provider&lt;/code&gt; with multiple implementations, plus a Router that provides multi-provider routing with automatic failover via per-provider circuit breakers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alternative Search Engines (Issue #55)
&lt;/h3&gt;

&lt;p&gt;Users wanted Brave, Bing (go figure), and other providers. But the TypeScript codebase was too tightly coupled to Google's API response format — the shared directory (41 files) made every change risky and far-reaching.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go fix&lt;/strong&gt;: A clean &lt;code&gt;Provider&lt;/code&gt; interface — each adapter normalizes provider-specific responses to common types (&lt;code&gt;SearchResult&lt;/code&gt;, &lt;code&gt;ImageResult&lt;/code&gt;, &lt;code&gt;NewsResult&lt;/code&gt;). Adding a new provider is one file implementing one interface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Redis Caching (Issue #72)
&lt;/h3&gt;

&lt;p&gt;The in-memory cache was lost on every process restart — which happened frequently with npx-launched servers. The complex persistence manager offered four strategies (Periodic, WriteThrough, OnShutdown, Hybrid), but none reliably survived the volatile process lifecycle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go fix&lt;/strong&gt;: A &lt;code&gt;cache.Cache&lt;/code&gt; interface with a hybrid implementation: memory LRU + AES-encrypted disk + optional Redis. Simple, testable, and it never loses data because the disk layer persists across restarts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monolithic Architecture (Issue #40)
&lt;/h3&gt;

&lt;p&gt;The project had 100+ source files but a tightly coupled &lt;code&gt;shared/&lt;/code&gt; directory with 41 files. Adding a single tool required touching 4+ documentation sections, and the import graph made refactoring perilous.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go fix&lt;/strong&gt;: One package per concern. Tool handlers are self-contained files. Adding a tool means writing one file and one line in the registry.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed Architecturally
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Node.js (old)&lt;/th&gt;
&lt;th&gt;Go (new)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Distribution&lt;/td&gt;
&lt;td&gt;npm/npx (runtime required)&lt;/td&gt;
&lt;td&gt;Single static binary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;430MB idle (80MB after optimization)&lt;/td&gt;
&lt;td&gt;~25MB baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Startup&lt;/td&gt;
&lt;td&gt;2-4 seconds (lazy imports)&lt;/td&gt;
&lt;td&gt;&amp;lt;100ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Process lifecycle&lt;/td&gt;
&lt;td&gt;Worker thread watchdog&lt;/td&gt;
&lt;td&gt;EOF detection, no orphans&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search providers&lt;/td&gt;
&lt;td&gt;Google only&lt;/td&gt;
&lt;td&gt;Multiple providers + fallback routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency&lt;/td&gt;
&lt;td&gt;Event loop + async/await&lt;/td&gt;
&lt;td&gt;Goroutines + semaphores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Type safety&lt;/td&gt;
&lt;td&gt;TypeScript + Zod&lt;/td&gt;
&lt;td&gt;Go type system + struct tags&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testing&lt;/td&gt;
&lt;td&gt;860+ Jest tests&lt;/td&gt;
&lt;td&gt;Table-driven tests + race detector&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scraping&lt;/td&gt;
&lt;td&gt;Playwright (heavy)&lt;/td&gt;
&lt;td&gt;4-tier pipeline (lightweight first)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Key Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Don't Fight Your Runtime
&lt;/h3&gt;

&lt;p&gt;Node.js process management is fundamentally fragile for long-lived servers launched via npx. The runtime doesn't support robust parent-death detection, and the nested process tree (npx → node → worker) makes signal propagation unreliable. We spent three versions building increasingly complex orphan detection. Go's single binary eliminated the entire category of problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: If you're spending significant engineering effort working around your runtime's limitations, that's a signal to evaluate whether the runtime fits the problem.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Side note: looking for a better runtime I looked into both Go and Rust (isn't Rust aweoms!?). Go won primarily for its lightweight goroutines exceling at I/O-bound operations, and the mcp-go SDK is superbly maintained. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2. Interface-Driven Design Enables Fearless Extension
&lt;/h3&gt;

&lt;p&gt;Adding Brave Search in the Go version was one file implementing one interface — about 200 lines including tests. In the Node.js version, the equivalent change would have touched 6+ files due to tightly coupled imports in the shared directory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: When you know extension is likely (new providers, new tools), invest in clean interfaces upfront. The interface is the specification; implementations are interchangeable.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Memory Matters for MCP Servers
&lt;/h3&gt;

&lt;p&gt;MCP servers run alongside AI assistants on developer machines. They're always-on background processes. A 430MB idle memory footprint was unacceptable — users would notice and uninstall. Go's ~25MB baseline lets the server stay resident without impact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: For developer tools that run continuously, memory efficiency is a feature, not an optimization. Choose your runtime accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Caching Architecture Should Be Boring
&lt;/h3&gt;

&lt;p&gt;The old project had four persistence strategies with complex heuristics for when to flush. The new one has: memory LRU + optional encrypted disk + optional Redis. Each layer is simple and independently testable. No heuristics, no race conditions, no data loss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: Boring infrastructure is reliable infrastructure. If your caching layer needs its own debugging session, it's too complex.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Documentation Should Be Drift-Resistant
&lt;/h3&gt;

&lt;p&gt;The old project required updating four separate documentation files per new tool. Inevitably, docs drifted from reality. The new project's test suite programmatically validates documentation claims — tool descriptions must mention alternatives, output schemas must match actual responses, and annotations must be consistent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: If documentation can be wrong without a test failing, it will eventually be wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Kept
&lt;/h2&gt;

&lt;p&gt;The rewrite preserved the user-facing contract:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Same tools&lt;/strong&gt; with identical semantics and parameter names&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Same MCP protocol&lt;/strong&gt; compatibility (Claude Desktop, Cursor, VS Code, any MCP client)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Same environment variables&lt;/strong&gt; (drop-in replacement for existing configs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Same search lenses&lt;/strong&gt; (curated domain lists, identical JSON format)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What improved (without breaking backwards compatibility):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OAuth 2.1 authentication for multi-client deployments&lt;/li&gt;
&lt;li&gt;Multi-tenancy with per-tenant session isolation&lt;/li&gt;
&lt;li&gt;Per-provider circuit breakers with automatic fallback&lt;/li&gt;
&lt;li&gt;Prometheus metrics for observability&lt;/li&gt;
&lt;li&gt;Structured audit logging for compliance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Since launching the Go version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero orphan process reports (vs. recurring issue in Node.js version)&lt;/li&gt;
&lt;li&gt;Multiple search providers with automatic failover (vs. single provider)&lt;/li&gt;
&lt;li&gt;4-tier scraping pipeline that tries lightweight methods first (vs. Playwright-only)&lt;/li&gt;
&lt;li&gt;Sub-100ms cold startup (vs. 2-4 seconds)&lt;/li&gt;
&lt;li&gt;Production-ready: rate limiting, circuit breakers, session isolation, audit trail&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Should You Rewrite?
&lt;/h2&gt;

&lt;p&gt;Probably not. Most rewrites fail because they're motivated by developer preference ("I want to use a new language") rather than architectural necessity. Ours succeeded because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The problems were &lt;strong&gt;architectural&lt;/strong&gt;, not implementational — no amount of refactoring within Node.js would fix process orphaning&lt;/li&gt;
&lt;li&gt;The user-facing contract was &lt;strong&gt;well-defined&lt;/strong&gt; — MCP provides a clean protocol boundary&lt;/li&gt;
&lt;li&gt;The scope was &lt;strong&gt;bounded&lt;/strong&gt; — we knew exactly what the server needed to do&lt;/li&gt;
&lt;li&gt;We had &lt;strong&gt;comprehensive tests&lt;/strong&gt; on the old version to validate behavioral equivalence&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If your problems are solvable within your current architecture, refactor. If they're fundamentally incompatible with your runtime or architecture, consider a rewrite — but only with clear success criteria and a well-defined boundary.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article covers the migration from &lt;a href="https://github.com/zoharbabin/google-researcher-mcp" rel="noopener noreferrer"&gt;google-researcher-mcp&lt;/a&gt; to &lt;a href="https://github.com/zoharbabin/web-researcher-mcp" rel="noopener noreferrer"&gt;web-researcher-mcp&lt;/a&gt;. The new project is open source under MIT and works with any MCP-compatible AI assistant.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>go</category>
      <category>lessonslearned</category>
    </item>
    <item>
      <title>Building a 13-Agent AI System for M&amp;A Due Diligence — Architecture Deep Dive</title>
      <dc:creator>Zohar Babin</dc:creator>
      <pubDate>Sun, 17 May 2026 19:05:46 +0000</pubDate>
      <link>https://forem.com/zoharbabin/building-a-13-agent-ai-system-for-ma-due-diligence-architecture-deep-dive-20ah</link>
      <guid>https://forem.com/zoharbabin/building-a-13-agent-ai-system-for-ma-due-diligence-architecture-deep-dive-20ah</guid>
      <description>&lt;h2&gt;
  
  
  The Problem Nobody Was Solving
&lt;/h2&gt;

&lt;p&gt;As a corp dev lead, I spent weeks doing the same thing after every deal: assembling the cross-domain picture from siloed advisor reports.&lt;/p&gt;

&lt;p&gt;Legal would flag a termination clause. Finance would flag revenue concentration. Same entity. Nobody connected the dots.&lt;/p&gt;

&lt;p&gt;This happens because due diligence is split into parallel workstreams — legal, financial, commercial, tax, regulatory — each run by separate teams with separate deliverables. The cross-referencing happens in someone's head, over coffee, two days before the IC memo is due.&lt;/p&gt;

&lt;p&gt;The numbers back this up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;31% of M&amp;amp;A failures trace back to DD shortcomings&lt;/strong&gt; (HBR, McKinsey, KPMG research)&lt;/li&gt;
&lt;li&gt;DD timelines keep compressing — six weeks becomes three, same scope&lt;/li&gt;
&lt;li&gt;Corp dev teams screen 200-1,000+ companies/year but close 1-3%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I built &lt;a href="https://github.com/zoharbabin/due-diligence-agents" rel="noopener noreferrer"&gt;Due Diligence Agents&lt;/a&gt; to fix this.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Does
&lt;/h2&gt;

&lt;p&gt;13 AI agents analyze every document in an M&amp;amp;A data room across 9 specialist domains — Legal, Finance, Commercial, ProductTech, Cybersecurity, HR, Tax, Regulatory, and ESG — then cross-reference findings automatically and trace each one to the exact page, section, and quote.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dd-agents
dd-agents auto-config &lt;span class="s2"&gt;"Buyer"&lt;/span&gt; &lt;span class="s2"&gt;"Target"&lt;/span&gt; &lt;span class="nt"&gt;--data-room&lt;/span&gt; ./your_data_room
dd-agents run deal-config.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output: an interactive HTML report, a 14-sheet Excel workbook, and per-subject JSON findings. &lt;a href="https://zoharbabin.github.io/due-diligence-agents/" rel="noopener noreferrer"&gt;See a sample report from synthetic data.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;The system has four layers:&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: 38-Step Async Pipeline
&lt;/h3&gt;

&lt;p&gt;The orchestrator (&lt;code&gt;engine.py&lt;/code&gt;) is a state machine with 38 async steps grouped into phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Setup&lt;/strong&gt; (steps 1-5): Load config, validate data room, resolve entities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discovery&lt;/strong&gt; (steps 6-13): Extract documents, build inventory, classify files, compute precedence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analysis&lt;/strong&gt; (steps 14-17): Build specialist prompts, route documents, spawn agents in parallel, check coverage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-Domain&lt;/strong&gt; (steps 18-20): Symbolic trigger evaluation, targeted respawn, merge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality&lt;/strong&gt; (steps 21-26): Judge review, merge findings, validate, deduplicate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reporting&lt;/strong&gt; (steps 27-38): Generate HTML, Excel, JSON, knowledge base&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every step supports checkpoint/resume. If the pipeline crashes at step 23, it restarts from step 23 — not from scratch. Steps are typed, and the state object serializes cleanly to JSON.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: 13 Agents
&lt;/h3&gt;

&lt;p&gt;9 specialists + 4 meta-agents, each spawned via Anthropic's &lt;code&gt;claude-agent-sdk&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Specialists&lt;/strong&gt;: Legal, Finance, Commercial, ProductTech, Cybersecurity, HR, Tax, Regulatory, ESG. Each gets domain-specific prompts, the relevant documents, and a set of tools (file read, search, finding write).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Meta-agents&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Judge&lt;/strong&gt;: Reviews specialist findings for quality, consistency, and missed coverage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Executive Synthesis&lt;/strong&gt;: Produces the deal-level summary with go/no-go signals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Red Flag Scanner&lt;/strong&gt;: Pattern-matches across all findings for deal-killers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acquirer Intelligence&lt;/strong&gt;: Tailors findings to the buyer's strategic context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Specialists run in parallel (batched by resource constraints). Meta-agents run sequentially after all specialists complete.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Neurosymbolic Cross-Domain Analysis
&lt;/h3&gt;

&lt;p&gt;This is the part that solved my original problem.&lt;/p&gt;

&lt;p&gt;After specialists produce their findings (pass 1), a &lt;strong&gt;deterministic rule engine&lt;/strong&gt; scans them for cross-domain dependencies. No LLM calls — just Python pattern matching.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example: Finance finds revenue recognition issue
# → Rule fires → Legal agent re-examines specific contracts
# for enforceability, clawback clauses, delivery milestones
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Seven built-in trigger rules cover the most common M&amp;amp;A cross-domain dependencies:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source → Target&lt;/th&gt;
&lt;th&gt;When It Fires&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Finance → Legal&lt;/td&gt;
&lt;td&gt;Revenue recognition finding needs contract enforceability check&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal → Finance&lt;/td&gt;
&lt;td&gt;Change-of-control clause needs financial exposure quantification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal → Finance&lt;/td&gt;
&lt;td&gt;Termination-for-convenience needs revenue-at-risk calculation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal → ProductTech&lt;/td&gt;
&lt;td&gt;IP ownership dispute needs technical dependency assessment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ProductTech → Legal&lt;/td&gt;
&lt;td&gt;Data privacy finding needs DPA/GDPR compliance review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Commercial → Finance&lt;/td&gt;
&lt;td&gt;SLA risk with &amp;gt;10% service credits needs financial quantification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Finance → Commercial&lt;/td&gt;
&lt;td&gt;Pricing discrepancy needs commercial rate card validation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When a rule fires, it creates a &lt;code&gt;CrossDomainTrigger&lt;/code&gt; with the specific contracts to re-examine and instructions for the target agent. The target agent runs a &lt;strong&gt;targeted pass-2 review&lt;/strong&gt; — only on the cited contracts, not the full data room. This keeps costs bounded.&lt;/p&gt;

&lt;p&gt;Budget-capped, priority-ordered. If no triggers fire, zero additional cost.&lt;/p&gt;

&lt;p&gt;The design is inspired by the &lt;a href="https://arxiv.org/abs/2604.00555" rel="noopener noreferrer"&gt;FAOS Platform&lt;/a&gt; — asymmetric coupling where symbolic rules constrain the LLM's scope while the LLM provides judgment. Symbolic decides &lt;em&gt;when&lt;/em&gt; intelligence is needed; the LLM provides &lt;em&gt;what&lt;/em&gt; to do about it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 4: 5 Blocking Quality Gates
&lt;/h3&gt;

&lt;p&gt;Every finding goes through validation before it reaches the report:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Coverage gate&lt;/strong&gt;: Did the agent analyze every assigned document?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema validation&lt;/strong&gt;: Does every finding have the required fields (severity, citations, category)?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Citation verification&lt;/strong&gt;: Can we trace the finding back to a specific page and quote?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic dedup&lt;/strong&gt;: Are two agents saying the same thing about the same document? (rapidfuzz token_sort_ratio ≥ 80)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Numerical audit&lt;/strong&gt;: Do financial figures in findings match what's in the source documents?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Fail-closed. If validation fails, the pipeline stops — it doesn't silently produce bad output.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Chat Mode (My Favorite Feature)
&lt;/h2&gt;

&lt;p&gt;After the pipeline runs, you can interrogate the results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dd-agents chat &lt;span class="nt"&gt;--report&lt;/span&gt; _dd/forensic-dd/runs/latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The chat agent has 14 MCP tools: citation verification against source PDFs, cross-contract search, entity resolution, and sandboxed document generation. Ask "build me a board summary of all P0 findings with revenue impact" and it writes a Python script, executes it in a sandbox, and hands you the &lt;code&gt;.xlsx&lt;/code&gt; file.&lt;/p&gt;

&lt;h2&gt;
  
  
  15 Things I Learned Building This
&lt;/h2&gt;

&lt;p&gt;These lessons apply to any system doing cross-document analysis at scale — not just M&amp;amp;A.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Extraction is harder than analysis. By a lot.
&lt;/h3&gt;

&lt;p&gt;Everyone focuses on the LLM prompts. But 80% of the real engineering is getting clean text out of messy documents. Our extraction pipeline has 4 tiers: pymupdf → pdftotext → OCR (Tesseract → GLM-OCR) → Claude vision as last resort. Each tier has 6 quality gates (min chars, printable ratio, density, readability, watermark detection, corruption check). Confidence scales with method quality — pymupdf gets 0.9 base, OCR gets 0.65.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Entity resolution is your invisible foundation
&lt;/h3&gt;

&lt;p&gt;"IBM", "International Business Machines", and "Red Hat" — are these the same entity? We use a 6-stage cascade: exact match → normalized (strip legal suffixes) → alias expansion → fuzzy match (rapidfuzz) → TF-IDF cosine similarity → learned matches from prior runs. Names ≤5 characters are blocked from fuzzy matching — without this, "Inc." matches random entities.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Don't dump everything into one context. Map-merge-resolve.
&lt;/h3&gt;

&lt;p&gt;A 200-page master agreement might have the deal-killer on page 147. You can't skip large files. But dumping them into one context drops accuracy from 95% to 74% (&lt;a href="https://www.addleshawgoddard.com/globalassets/insights/technology/llm/rag-report.pdf" rel="noopener noreferrer"&gt;Addleshaw Goddard, 510 contracts&lt;/a&gt;). Instead: chunk at page boundaries (150K chars, 15% overlap), analyze each chunk independently, merge with priority logic (YES beats NO, specific beats generic), and only invoke LLM arbitration when chunks disagree. The 21-point accuracy gain is entirely engineering — no model change.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Hallucination is an engineering problem, not a model problem
&lt;/h3&gt;

&lt;p&gt;No single defense works. We use 5 layers: (1) Pydantic schema validation on every response, (2) mandatory citation with file_path/page/exact_quote verified against source, (3) explicit "NOT_FOUND" escape valve — without this, models fabricate clauses rather than admit ignorance, (4) adversarial Judge review with accusatory framing ("this finding appears fabricated — prove it with a direct quote"), (5) 6-layer deterministic numerical audit.&lt;/p&gt;

&lt;p&gt;Layer 3 changed everything. When you tell the model "if you can't find this clause, say NOT_FOUND," hallucination drops dramatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Know when to stop using LLMs
&lt;/h3&gt;

&lt;p&gt;We had an LLM agent doing validation and report synthesis. We replaced it with deterministic Python. Quality went up, cost went down. The rule: use LLMs for analysis and synthesis; use Python for validation, dedup, and audit. If you can write the logic as deterministic code, do it.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Self-verification works — but only with accusatory framing
&lt;/h3&gt;

&lt;p&gt;After agents produce findings, a follow-up pass challenges them on high-severity claims. Polite prompts ("please review your finding") have near-zero effect — models confirm their own output. Accusatory prompts ("this finding appears fabricated," "the cited clause doesn't exist") force re-examination and produce a 9.2% accuracy improvement.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Cross-agent dedup is different than you think
&lt;/h3&gt;

&lt;p&gt;When 4 agents analyze the same document, they find the same issue but describe it differently. Three rules: (1) never dedup within the same agent — two similar findings from Legal are intentionally distinct, (2) only dedup across agents on the same document — similar findings on different documents are different findings, (3) keep contributing agent metadata so you know which domains flagged it.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Context window engineering is a first-class discipline
&lt;/h3&gt;

&lt;p&gt;It's not just about fitting data in — it's about &lt;em&gt;where&lt;/em&gt; things go. Critical instructions go at the start (highest recall zone). Document content goes in the middle (lowest recall — ~40% worse). Constraints and format rules go at the end (second-highest recall). We budget 40% of the context window for tool calls and reasoning.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. Quality gates must be blocking, not advisory
&lt;/h3&gt;

&lt;p&gt;If validation just logs a warning, nobody reads it. If it halts the pipeline, quality is non-negotiable. Same for agent guardrails: hard turn limits (soft at 200, force-kill at 3x), path guards (agents can only write under &lt;code&gt;_dd/&lt;/code&gt;), bash guards (24 blocked patterns — no &lt;code&gt;rm -rf&lt;/code&gt;, no &lt;code&gt;sudo&lt;/code&gt;, no pipe-to-shell). Better to produce nothing than unreliable output.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Every claim must be traceable to source
&lt;/h3&gt;

&lt;p&gt;Citation verification uses 4 scopes: exact page match → adjacent pages ±1 → full document fuzzy match (80%+) → cross-file search. That last one matters — if the quote isn't in the cited file, we search all files for that entity. Auto-corrects file misattribution.&lt;/p&gt;

&lt;h3&gt;
  
  
  11. Most of what AI finds is noise
&lt;/h3&gt;

&lt;p&gt;Run 9 agents across hundreds of documents and you'll get thousands of findings. We use a 3-stage classification: noise filter (15 patterns for extraction artifacts), data quality filter (14 patterns for "data unavailable" gaps), then material findings. Plus 5 severity recalibration rules — e.g., a change-of-control clause that only applies to competitors gets downgraded from P0 to P3 automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  12. Same clause, different deal, different severity
&lt;/h3&gt;

&lt;p&gt;An anti-assignment clause is P0 in an asset purchase (blocks contract transfer) but P3 in a stock purchase (entity doesn't change). Deal-type context must flow through the entire pipeline: prompt-time rules, post-hoc deterministic adjustments, and executive judgment overrides — with full audit trail.&lt;/p&gt;

&lt;h3&gt;
  
  
  13. Every API call is a deal cost
&lt;/h3&gt;

&lt;p&gt;Three model profiles: economy (Haiku for extraction), standard (Sonnet for analysis), premium (Opus for synthesis). Per-agent cost tracking. Hard budget limits that halt the pipeline. Right model for right task.&lt;/p&gt;

&lt;h3&gt;
  
  
  14. Pydantic v2 everywhere
&lt;/h3&gt;

&lt;p&gt;137+ models with &lt;code&gt;model_json_schema()&lt;/code&gt; for structured outputs. Strict mypy across 199 source files. The type system catches real bugs — a finding with &lt;code&gt;evidence&lt;/code&gt; instead of &lt;code&gt;citations&lt;/code&gt; gets blocked by the schema guard hook before it's written to disk.&lt;/p&gt;

&lt;h3&gt;
  
  
  15. Make every run smarter than the last
&lt;/h3&gt;

&lt;p&gt;Inspired by Karpathy's "LLM Wiki" pattern: a persistent knowledge base compounds across runs. Finding lineage via SHA-256 fingerprinting tracks findings even when wording changes. A NetworkX knowledge graph with 11 typed edge types captures entity relationships, contradictions, and clause interactions. Run 2 knows what Run 1 found — and catches what changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dd-agents
dd-agents auto-config &lt;span class="s2"&gt;"Buyer"&lt;/span&gt; &lt;span class="s2"&gt;"Target"&lt;/span&gt; &lt;span class="nt"&gt;--data-room&lt;/span&gt; ./your_data_room
dd-agents run deal-config.json &lt;span class="nt"&gt;--dry-run&lt;/span&gt;  &lt;span class="c"&gt;# Preview without API calls&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://zoharbabin.github.io/due-diligence-agents/" rel="noopener noreferrer"&gt;Sample report&lt;/a&gt; (synthetic data, no install needed)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/zoharbabin/due-diligence-agents" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; — Apache 2.0, 3,714 tests, strict mypy.&lt;/p&gt;

&lt;p&gt;Built on &lt;a href="https://github.com/anthropics/claude-agent-sdk-python" rel="noopener noreferrer"&gt;Anthropic's Claude Agent SDK&lt;/a&gt;. Looking for feedback — especially from anyone who's dealt with data room analysis and can tell me whether the report structure maps to how DD findings are actually consumed.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>opensource</category>
      <category>sideprojects</category>
    </item>
  </channel>
</rss>
