<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Linghua Jin</title>
    <description>The latest articles on Forem by Linghua Jin (@badmonster0).</description>
    <link>https://forem.com/badmonster0</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2902937%2F4814acf1-d1f8-401b-acbf-93bc92068bf3.png</url>
      <title>Forem: Linghua Jin</title>
      <link>https://forem.com/badmonster0</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/badmonster0"/>
    <language>en</language>
    <item>
      <title>Your AI Coding Agent is Blind. Here's the Fix.</title>
      <dc:creator>Linghua Jin</dc:creator>
      <pubDate>Wed, 18 Mar 2026 04:25:45 +0000</pubDate>
      <link>https://forem.com/badmonster0/your-ai-coding-agent-is-blind-heres-the-fix-569n</link>
      <guid>https://forem.com/badmonster0/your-ai-coding-agent-is-blind-heres-the-fix-569n</guid>
      <description>&lt;p&gt;I've been using Claude Code, Cursor, and Codex daily. And I kept hitting the same wall.&lt;/p&gt;

&lt;p&gt;The agent would hallucinate functions. Suggest code that almost worked. Miss obvious patterns that were right there in the codebase.&lt;/p&gt;

&lt;p&gt;I thought it was a model problem. It wasn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It was a context problem.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Reason Your AI Agent Keeps Getting It Wrong
&lt;/h2&gt;

&lt;p&gt;When your coding agent tries to understand your codebase, it does something naive by default: it reads files. Sometimes whole files. Sometimes random chunks.&lt;/p&gt;

&lt;p&gt;The problem? &lt;strong&gt;Most codebases are too large to fit in a context window.&lt;/strong&gt; So the agent gets a sliced, incomplete, often misleading view of your code.&lt;/p&gt;

&lt;p&gt;Imagine asking a surgeon to operate while only being able to see through a 2-inch hole. That's your AI agent right now.&lt;/p&gt;

&lt;p&gt;The agent isn't dumb. It's just &lt;em&gt;blind&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fix: AST-Based Semantic Search
&lt;/h2&gt;

&lt;p&gt;Here's what changes everything: instead of feeding your agent raw file contents or naive text chunks, you give it &lt;strong&gt;semantically meaningful code units&lt;/strong&gt; — extracted using the Abstract Syntax Tree (AST).&lt;/p&gt;

&lt;p&gt;AST-based chunking understands &lt;em&gt;code structure&lt;/em&gt;. It knows where functions start and end. It won't split a class in half. It keeps imports with their context.&lt;/p&gt;

&lt;p&gt;The result? Your agent gets &lt;strong&gt;exactly&lt;/strong&gt; the code it needs — no noise, no half-functions, no hallucination-inducing garbage.&lt;/p&gt;

&lt;p&gt;This is what &lt;strong&gt;&lt;a href="https://github.com/cocoindex-io/cocoindex-code" rel="noopener noreferrer"&gt;cocoindex-code&lt;/a&gt;&lt;/strong&gt; does.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is cocoindex-code?
&lt;/h2&gt;

&lt;p&gt;It's a lightweight, open-source CLI tool that builds a &lt;strong&gt;semantic search index&lt;/strong&gt; over your codebase using AST-based chunking + local embedding models.&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⚡ &lt;strong&gt;70% fewer tokens&lt;/strong&gt; consumed by your agent&lt;/li&gt;
&lt;li&gt;🚀 &lt;strong&gt;1-minute setup&lt;/strong&gt; — zero config, zero API keys required&lt;/li&gt;
&lt;li&gt;🌳 &lt;strong&gt;AST-based chunking&lt;/strong&gt; for 28+ languages (Python, TypeScript, Rust, Go, Java, C/C++ and more)&lt;/li&gt;
&lt;li&gt;🔄 &lt;strong&gt;Incremental indexing&lt;/strong&gt; — only re-indexes changed files&lt;/li&gt;
&lt;li&gt;🔌 Works with &lt;strong&gt;Claude Code, Cursor, Codex, OpenCode&lt;/strong&gt; via Skills or MCP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The default embedding model runs &lt;em&gt;locally&lt;/em&gt; (sentence-transformers/all-MiniLM-L6-v2) — completely free, no API key needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wait, Isn't That What LSP Does?
&lt;/h2&gt;

&lt;p&gt;Great question. LSP (Language Server Protocol) is incredible for editors — it gives you go-to-definition, find references, real-time type errors, rename refactoring.&lt;/p&gt;

&lt;p&gt;But LSP and cocoindex-code solve &lt;em&gt;different problems&lt;/em&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;LSP&lt;/th&gt;
&lt;th&gt;cocoindex-code&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Real-time editing assistance&lt;/td&gt;
&lt;td&gt;Semantic search across entire codebase&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Search type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Exact symbol matching&lt;/td&gt;
&lt;td&gt;Natural language / fuzzy semantic search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Jump to definition, rename&lt;/td&gt;
&lt;td&gt;"Where is the auth logic?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Target user&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You, in your editor&lt;/td&gt;
&lt;td&gt;Your AI coding agent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;LSP answers: &lt;em&gt;"Where is &lt;code&gt;getUserById&lt;/code&gt; defined?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;cocoindex-code answers: &lt;em&gt;"Find all code related to user authentication and session management."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;They're &lt;strong&gt;complementary&lt;/strong&gt;, not competing. Use both.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Get Started (Seriously, 1 Minute)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Install:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pipx &lt;span class="nb"&gt;install &lt;/span&gt;cocoindex-code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Add to your Claude Code agent (Skill integration):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add cocoindex-io/cocoindex-code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's literally it. The skill teaches your agent to automatically initialize, index, and search your codebase whenever it's helpful. No &lt;code&gt;ccc init&lt;/code&gt;, no &lt;code&gt;ccc index&lt;/code&gt; required manually.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Or use it directly from CLI:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ccc index                          &lt;span class="c"&gt;# build the index&lt;/span&gt;
ccc search &lt;span class="s2"&gt;"authentication logic"&lt;/span&gt;  &lt;span class="c"&gt;# semantic search&lt;/span&gt;
ccc search &lt;span class="s2"&gt;"database connection"&lt;/span&gt;   &lt;span class="c"&gt;# finds related code even without exact names&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;MCP Server&lt;/strong&gt; support is also available for Cursor, Codex, OpenCode, and any MCP-compatible agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Impact
&lt;/h2&gt;

&lt;p&gt;A 10K+ file codebase. An agent that previously hallucinated constantly. After adding cocoindex-code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent finds relevant code &lt;strong&gt;in seconds&lt;/strong&gt; instead of scanning entire directories&lt;/li&gt;
&lt;li&gt;Token usage &lt;strong&gt;dropped by 70%&lt;/strong&gt; — meaning faster responses AND lower costs&lt;/li&gt;
&lt;li&gt;Hallucinations on codebase-specific logic went down dramatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One community user shared: they were scaling up a 10K+ file codebase with Codex and said it "just worked."&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters for the Future of AI Coding
&lt;/h2&gt;

&lt;p&gt;We're entering an era where AI agents don't just autocomplete — they &lt;strong&gt;plan, refactor, and ship features&lt;/strong&gt;. For that to work reliably, agents need to &lt;em&gt;understand&lt;/em&gt; codebases, not just guess at them.&lt;/p&gt;

&lt;p&gt;AST-based semantic search is one of the most important missing primitives in the current AI coding stack. cocoindex-code is one of the first open-source tools to make it trivially easy to use.&lt;/p&gt;

&lt;p&gt;The CLI is having a comeback. Not because terminals are trendy — but because &lt;strong&gt;small, composable, offline-first tools are exactly what AI agents need&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;🔗 GitHub: &lt;a href="https://github.com/cocoindex-io/cocoindex-code" rel="noopener noreferrer"&gt;https://github.com/cocoindex-io/cocoindex-code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Apache-2.0 license. Fully open source. Drop a ⭐ if you find it useful — it genuinely helps the project grow.&lt;/p&gt;

&lt;p&gt;Have you tried semantic code search with your AI agent? I'd love to hear what workflows people are building in the comments 👇&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>productivity</category>
      <category>codenewbie</category>
    </item>
    <item>
      <title>I Built a Code AST MCP That Saves 70% Tokens and Went Viral (54K+ Views)</title>
      <dc:creator>Linghua Jin</dc:creator>
      <pubDate>Wed, 11 Mar 2026 01:54:54 +0000</pubDate>
      <link>https://forem.com/badmonster0/i-built-a-code-ast-mcp-that-saves-70-tokens-and-went-viral-54k-views-4c6a</link>
      <guid>https://forem.com/badmonster0/i-built-a-code-ast-mcp-that-saves-70-tokens-and-went-viral-54k-views-4c6a</guid>
      <description>&lt;p&gt;Last week, I open-sourced a lightweight Code MCP server that uses AST (Abstract Syntax Tree) parsing to give coding agents semantic understanding of your codebase. It went viral on X with 54K+ views.&lt;/p&gt;

&lt;p&gt;Here's the tweet that started it all:&lt;/p&gt;

&lt;p&gt;

&lt;iframe class="tweet-embed" id="tweet-2031366453153157139-479" src="https://platform.twitter.com/embed/Tweet.html?id=2031366453153157139"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-2031366453153157139-479');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=2031366453153157139&amp;amp;theme=dark"
  }





&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Coding Agents Are Burning Tokens
&lt;/h2&gt;

&lt;p&gt;If you've used Claude Code, Codex, Cursor, or any AI coding agent on a real codebase, you've probably noticed: they dump entire files into context just to understand your code structure. That's expensive, slow, and wasteful.&lt;/p&gt;

&lt;p&gt;The agent doesn't need your whole file. It needs to know &lt;em&gt;what functions exist, what classes are defined, and how they relate to each other&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: AST-Based Semantic Code Search
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/cocoindex-io/cocoindex-code" rel="noopener noreferrer"&gt;cocoindex-code&lt;/a&gt; - a super lightweight embedded MCP that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parses your code into ASTs&lt;/strong&gt; using tree-sitter, extracting meaningful chunks (functions, classes, methods)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creates semantic embeddings&lt;/strong&gt; of those chunks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lets your coding agent search by meaning&lt;/strong&gt;, not just text matching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Only re-indexes changed files&lt;/strong&gt; - built on a Rust-based incremental indexing engine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result? &lt;strong&gt;70% token savings&lt;/strong&gt; and noticeably faster coding agent responses.&lt;/p&gt;

&lt;h2&gt;
  
  
  1-Minute Setup - No Config Needed
&lt;/h2&gt;

&lt;p&gt;For Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pipx &lt;span class="nb"&gt;install &lt;/span&gt;cocoindex-code
claude mcp add cocoindex-code &lt;span class="nt"&gt;--&lt;/span&gt; cocoindex-code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Codex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex mcp add cocoindex-code &lt;span class="nt"&gt;--&lt;/span&gt; cocoindex-code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No database, no API keys, no config files. It just works.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works Under the Hood
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tree-sitter parsing&lt;/strong&gt; breaks your code into semantic chunks (functions, classes, etc.) across 20+ languages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local embedding model&lt;/strong&gt; (SentenceTransformers) creates vector representations - completely free, no API key needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQLite + vector search&lt;/strong&gt; stores everything locally and portably&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental indexing&lt;/strong&gt; via &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;CocoIndex&lt;/a&gt; (Rust engine) means only changed files get re-processed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When your agent needs to find code, it calls the &lt;code&gt;search&lt;/code&gt; MCP tool with a natural language query and gets back exactly the relevant code chunks with file paths and line numbers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Went Viral
&lt;/h2&gt;

&lt;p&gt;I think people resonated with a few things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Real pain point&lt;/strong&gt; - everyone using coding agents feels the token burn&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero friction&lt;/strong&gt; - one pip install and one MCP add command&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No vendor lock-in&lt;/strong&gt; - works with Claude, Codex, Cursor, or any MCP-compatible agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open source (Apache 2.0)&lt;/strong&gt; - you can inspect every line of code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No API keys required&lt;/strong&gt; - the default embedding model runs locally for free&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Supported Languages
&lt;/h2&gt;

&lt;p&gt;Python, JavaScript/TypeScript, Rust, Go, Java, C/C++, C#, Ruby, Kotlin, Swift, SQL, Shell, and more. It uses tree-sitter grammars so adding new languages is straightforward.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;We're actively working on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better embedding models optimized for code (try &lt;code&gt;nomic-ai/CodeRankEmbed&lt;/code&gt; with a GPU)&lt;/li&gt;
&lt;li&gt;Enterprise features for large codebases and shared indexing across teams&lt;/li&gt;
&lt;li&gt;More MCP tools beyond search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repo is at &lt;strong&gt;&lt;a href="https://github.com/cocoindex-io/cocoindex-code" rel="noopener noreferrer"&gt;github.com/cocoindex-io/cocoindex-code&lt;/a&gt;&lt;/strong&gt; - 520+ stars and growing fast.&lt;/p&gt;

&lt;p&gt;Built with &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;CocoIndex&lt;/a&gt;, our open-source Rust-based data indexing framework.&lt;/p&gt;

&lt;p&gt;Would love to hear your experience if you try it out. Drop a comment or open an issue on GitHub!&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Built a Tiny MCP That Understands Your Code and Saves 70% Tokens</title>
      <dc:creator>Linghua Jin</dc:creator>
      <pubDate>Sun, 22 Feb 2026 04:11:28 +0000</pubDate>
      <link>https://forem.com/badmonster0/i-built-a-tiny-mcp-that-understands-your-code-and-saves-70-tokens-2hp4</link>
      <guid>https://forem.com/badmonster0/i-built-a-tiny-mcp-that-understands-your-code-and-saves-70-tokens-2hp4</guid>
      <description>&lt;p&gt;Every coding agent demo looks magical... until you point it at a real codebase. Then it either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chokes on context windows&lt;/li&gt;
&lt;li&gt;Hallucinates around stale code&lt;/li&gt;
&lt;li&gt;Or becomes so slow you might as well just grep&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I hit this wall building AI workflows with large Rust/Python/TS repos, so I built something I actually wanted for my own stack: &lt;strong&gt;a super light-weight, AST-based embedded MCP that just works on your codebase.&lt;/strong&gt; It's called &lt;a href="https://github.com/cocoindex-io/cocoindex-code" rel="noopener noreferrer"&gt;&lt;code&gt;cocoindex-code&lt;/code&gt;&lt;/a&gt; and it's already saving me ~70% tokens and a lot of waiting time.&lt;/p&gt;

&lt;p&gt;If you're using Claude, Codex, Cursor, or any MCP-friendly coding agent, this post is for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Idea: AST + Incremental Indexing
&lt;/h2&gt;

&lt;p&gt;Most "code RAG" setups feel like infra projects: spin up a vector DB, write ETL, fight schema drift, tune chunking, maintain workers. Then you pray it all stays in sync.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cocoindex-code&lt;/code&gt; takes the opposite approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embedded MCP&lt;/strong&gt;: It runs locally as an MCP server, no separate DB to run or maintain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AST-based indexing&lt;/strong&gt;: It understands code structure via Tree-sitter, so you get meaningful chunks (functions, classes, blocks) instead of random 200-line windows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental updates&lt;/strong&gt;: Built on top of the Rust-based CocoIndex engine, it only re-indexes changed files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real multi-language support&lt;/strong&gt;: Python, JS/TS, Rust, Go, Java, C/C++, C#, SQL, Shell, and more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal: you ask an agent a question, it pulls precisely the code it needs, &lt;strong&gt;without blowing up your context window&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Get Out of the Box
&lt;/h2&gt;

&lt;p&gt;Here's what you get by just adding the MCP:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantic code search tool&lt;/strong&gt;: &lt;code&gt;search(query, limit, offset, refresh_index)&lt;/code&gt; as an MCP tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instant token savings&lt;/strong&gt;: Because only relevant code chunks go into prompts, not entire files or folders.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt;: Incremental indexing + Rust engine means updates feel near-instant on typical dev repos.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No-key local embeddings by default&lt;/strong&gt;: Uses &lt;code&gt;sentence-transformers/all-MiniLM-L6-v2&lt;/code&gt; locally via SentenceTransformers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optional power-ups&lt;/strong&gt;: Swap in any LiteLLM-supported embedding model (OpenAI, Gemini, Mistral, Voyage for code, Ollama, etc.).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means you can go from "plain coding agent" to "coding agent that actually knows your codebase" in about a minute.&lt;/p&gt;

&lt;h2&gt;
  
  
  1-Minute Setup for Claude, Codex, and OpenCode
&lt;/h2&gt;

&lt;p&gt;First, install &lt;code&gt;uv&lt;/code&gt; if you don't have it yet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://astral.sh/uv/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Claude
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add cocoindex-code &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--&lt;/span&gt; uvx &lt;span class="nt"&gt;--prerelease&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;explicit &lt;span class="nt"&gt;--with&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"cocoindex&amp;gt;=1.0.0a16"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  cocoindex-code@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Codex
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex mcp add cocoindex-code &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--&lt;/span&gt; uvx &lt;span class="nt"&gt;--prerelease&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;explicit &lt;span class="nt"&gt;--with&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"cocoindex&amp;gt;=1.0.0a16"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  cocoindex-code@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  OpenCode
&lt;/h3&gt;

&lt;p&gt;You can do it interactively:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;opencode mcp add
&lt;span class="c"&gt;# MCP server name: cocoindex-code&lt;/span&gt;
&lt;span class="c"&gt;# type: local&lt;/span&gt;
&lt;span class="c"&gt;# command:&lt;/span&gt;
&lt;span class="c"&gt;# uvx --prerelease=explicit --with cocoindex&amp;gt;=1.0.0a16 cocoindex-code@latest&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Point your agent at your repo, and you now have semantic search over your codebase as an MCP tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the &lt;code&gt;search&lt;/code&gt; MCP Tool Works
&lt;/h2&gt;

&lt;p&gt;Once connected, the MCP exposes a &lt;code&gt;search&lt;/code&gt; tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# natural language or code snippet
&lt;/span&gt;  &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# 1-100
&lt;/span&gt;  &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# pagination
&lt;/span&gt;  &lt;span class="n"&gt;refresh_index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# re-index before querying
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each result comes back with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File path&lt;/li&gt;
&lt;li&gt;Language&lt;/li&gt;
&lt;li&gt;Code content&lt;/li&gt;
&lt;li&gt;Start/end line numbers&lt;/li&gt;
&lt;li&gt;Similarity score&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've found three killer use cases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;"Where is the actual implementation of X?" - when the repo has 5 similarly named functions.&lt;/li&gt;
&lt;li&gt;"Show me all the auth-related logic touching JWT refresh."&lt;/li&gt;
&lt;li&gt;"Find the code that matches this stack trace snippet."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Because the index is kept up to date incrementally, you can refactor, run tests, and immediately use the agent against the new code layout without re-running some giant offline job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supported Languages and Smart Defaults
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;cocoindex-code&lt;/code&gt; ships with a very practical language matrix:&lt;/p&gt;

&lt;p&gt;C, C++, C#, CSS/SCSS, Go, HTML, Java, JavaScript/TypeScript/TSX, JSON/YAML/TOML, Kotlin, Markdown/MDX, Pascal, PHP, Python, R, Ruby, Rust, Scala, Solidity, SQL, Swift, XML&lt;/p&gt;

&lt;p&gt;It also auto-excludes noisy directories like &lt;code&gt;__pycache__&lt;/code&gt;, &lt;code&gt;node_modules&lt;/code&gt;, &lt;code&gt;target&lt;/code&gt;, &lt;code&gt;dist&lt;/code&gt;, and vendored dependencies.&lt;/p&gt;

&lt;p&gt;Root path is auto-discovered via &lt;code&gt;.cocoindex_code/&lt;/code&gt;, &lt;code&gt;.git/&lt;/code&gt;, or falling back to current working directory. In practice, you usually don't set any env vars at all - it just finds your repo root.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embeddings: Start Free, Scale Later
&lt;/h2&gt;

&lt;p&gt;Out of the box, the project uses a local SentenceTransformers model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Default: &lt;code&gt;sbert/sentence-transformers/all-MiniLM-L6-v2&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;No API key, no billing surprises, completely local.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want stronger semantic understanding for code-heavy repos, you can point &lt;code&gt;COCOINDEX_CODE_EMBEDDING_MODEL&lt;/code&gt; to any LiteLLM-supported embedding model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ollama (local)&lt;/li&gt;
&lt;li&gt;OpenAI / Azure OpenAI&lt;/li&gt;
&lt;li&gt;Gemini&lt;/li&gt;
&lt;li&gt;Mistral&lt;/li&gt;
&lt;li&gt;Voyage (code-optimized)&lt;/li&gt;
&lt;li&gt;Cohere&lt;/li&gt;
&lt;li&gt;AWS Bedrock&lt;/li&gt;
&lt;li&gt;Nebius&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Basically: &lt;strong&gt;start with free local, upgrade only if/when you actually need it.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What About Huge / Enterprise Codebases?
&lt;/h2&gt;

&lt;p&gt;Under the hood, &lt;code&gt;cocoindex-code&lt;/code&gt; uses &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;CocoIndex&lt;/a&gt;, a Rust-based indexing engine built for large-scale, incremental data workflows.&lt;/p&gt;

&lt;p&gt;For big org setups, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Share indexes across teammates instead of re-indexing on every machine.&lt;/li&gt;
&lt;li&gt;Take advantage of features like branch dedupe to avoid duplicate work.&lt;/li&gt;
&lt;li&gt;Run it as part of a larger data/indexing platform on top of CocoIndex.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  If You Want to Try It, Here's the Ask
&lt;/h2&gt;

&lt;p&gt;If this sounds useful, here's a small but meaningful way you can help:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Star the repo&lt;/strong&gt;: &lt;a href="https://github.com/cocoindex-io/cocoindex-code" rel="noopener noreferrer"&gt;&lt;code&gt;cocoindex-code&lt;/code&gt;&lt;/a&gt; and the underlying &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;&lt;code&gt;cocoindex&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try it on your main project&lt;/strong&gt; (the messy one, not the toy one).&lt;/li&gt;
&lt;li&gt;Drop feedback, issues, or ideas in the GitHub repo.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I'm especially interested in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repos where existing "code RAG" tools failed you&lt;/li&gt;
&lt;li&gt;Languages or frameworks you want better support for&lt;/li&gt;
&lt;li&gt;Workflows where you want your coding agent to feel &lt;em&gt;10x more context-aware&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you do try it, let me know in the comments what stack you used it on - I'd love to feature a few real-world examples in a follow-up post.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Stopped Manually Documenting My Repos: Now They Generate Their Own Wikis</title>
      <dc:creator>Linghua Jin</dc:creator>
      <pubDate>Thu, 05 Feb 2026 06:25:34 +0000</pubDate>
      <link>https://forem.com/badmonster0/i-stopped-manually-documenting-my-repos-now-they-generate-their-own-wikis-53gd</link>
      <guid>https://forem.com/badmonster0/i-stopped-manually-documenting-my-repos-now-they-generate-their-own-wikis-53gd</guid>
      <description>&lt;p&gt;Every few months I'd open an old repo and ask the same question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Who wrote this, and why were they so angry?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Function names that lie, classes with no explanation, a &lt;code&gt;main.py&lt;/code&gt; that secretly runs the whole company. The code worked, but the &lt;strong&gt;documentation&lt;/strong&gt; never kept up. It was always "I'll document this later," and later never came.&lt;/p&gt;

&lt;p&gt;So I did the only reasonable thing: I stopped trying.&lt;/p&gt;

&lt;p&gt;Instead, I wired up a pipeline that reads my projects, calls an LLM, and spits out a one-pager wiki for each codebase—&lt;strong&gt;automatically, every time the code changes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this post, I'll walk through how to build that: a multi-codebase summarization pipeline using &lt;a href="https://cocoindex.io" rel="noopener noreferrer"&gt;CocoIndex&lt;/a&gt; that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scans multiple projects in one go.&lt;/li&gt;
&lt;li&gt;Uses structured LLM outputs (Pydantic + Instructor) to extract functions, classes, and relationships.&lt;/li&gt;
&lt;li&gt;Aggregates everything into a project-level summary.&lt;/li&gt;
&lt;li&gt;Generates Markdown docs with Mermaid diagrams so you actually understand the architecture.&lt;/li&gt;
&lt;li&gt;Only re-runs the &lt;strong&gt;minimum&lt;/strong&gt; work when files change, thanks to incremental processing and memoization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've ever wished your monorepo came with a live-updating architecture wiki, this is for you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuhgkafvv6c5nyor2555c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuhgkafvv6c5nyor2555c.png" alt="markdown" width="800" height="806"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Idea: A Self-Updating Wiki For Every Project
&lt;/h2&gt;

&lt;p&gt;Let's say you have a &lt;code&gt;projects/&lt;/code&gt; folder like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;projects/
  ├── my_project_1/
  │   ├── main.py
  │   └── utils.py
  ├── my_project_2/
  │   └── app.py
  └── ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What we want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For &lt;strong&gt;each&lt;/strong&gt; subdirectory, generate a Markdown file like &lt;code&gt;output/my_project_1.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;That Markdown includes:

&lt;ul&gt;
&lt;li&gt;An overview of the project's purpose.
&lt;/li&gt;
&lt;li&gt;Key public classes and functions, with human-readable summaries.&lt;/li&gt;
&lt;li&gt;Mermaid diagrams showing how components connect.&lt;/li&gt;
&lt;li&gt;Optional per-file details if the project spans multiple files.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;And whenever you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add a new project,
&lt;/li&gt;
&lt;li&gt;Modify a file, or
&lt;/li&gt;
&lt;li&gt;Change the LLM extraction logic,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;you just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cocoindex update main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CocoIndex figures out what changed and recomputes only what's necessary.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Setup – Point the Pipeline at Your Code
&lt;/h2&gt;

&lt;p&gt;First, install CocoIndex and the supporting libraries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--pre&lt;/span&gt; &lt;span class="s2"&gt;"cocoindex&amp;gt;=1.0.0a6"&lt;/span&gt; instructor litellm pydantic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a project folder and enter it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;multi-codebase-summarization
&lt;span class="nb"&gt;cd &lt;/span&gt;multi-codebase-summarization
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set up your LLM configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-api-key"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;LLM_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"gemini/gemini-2.5-flash"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tell CocoIndex where to keep its state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"COCOINDEX_DB=./cocoindex.db"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then create your &lt;code&gt;projects/&lt;/code&gt; directory and drop in any Python projects you want summarized:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;projects
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2: The App – Treat Each Directory as a Project
&lt;/h2&gt;

&lt;p&gt;CocoIndex has the notion of an &lt;strong&gt;App&lt;/strong&gt;: a top-level unit that defines how data flows from sources to outputs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;__future__&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;annotations&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Collection&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;instructor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;litellm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;acompletion&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;coco&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cocoindex.connectors&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;localfs&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cocoindex.resources.file&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FileLike&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PatternFilePathMatcher&lt;/span&gt;

&lt;span class="n"&gt;LLM_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LLM_MODEL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini/gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;coco&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MultiCodebaseSummarization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;app_main&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;root_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pathlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./projects&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pathlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;app_main&lt;/code&gt; function scans subdirectories and mounts a processing component per project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@coco.function&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;app_main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root_dir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;root_dir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;iterdir&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_dir&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;project_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
        &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;localfs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;walk_dir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;recursive&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;path_matcher&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;PatternFilePathMatcher&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;included_patterns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="n"&gt;excluded_patterns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__pycache__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;coco&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;coco&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;component_subpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;project&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;process_project&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 3: Extract Structured File Info with an LLM
&lt;/h2&gt;

&lt;p&gt;Define &lt;strong&gt;Pydantic models&lt;/strong&gt; that describe exactly what we want:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FunctionInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function signature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;is_coco_function&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Whether decorated with @coco.function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Brief summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ClassInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Class name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Brief summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CodebaseInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File path or project name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Brief summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;public_classes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ClassInfo&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;public_functions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;FunctionInfo&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;mermaid_graphs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With Instructor wrapping LiteLLM, we tell the model to fill in this schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_instructor_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;instructor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_litellm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;acompletion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;instructor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Mode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@coco.function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_file_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FileLike&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;CodebaseInfo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze the following Python file...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_instructor_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LLM_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CodebaseInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;CodebaseInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key point: &lt;code&gt;memo=True&lt;/code&gt; caches results by file content—unchanged files skip the LLM entirely.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Aggregate Files into a Project-Level Summary
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@coco.function&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;aggregate_project_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;file_infos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;CodebaseInfo&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;CodebaseInfo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_infos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;file_infos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Single file, reuse
&lt;/span&gt;
    &lt;span class="c1"&gt;# Multi-file: LLM synthesizes
&lt;/span&gt;    &lt;span class="n"&gt;files_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;file_infos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Aggregate these files into one summary:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;files_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_instructor_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LLM_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CodebaseInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 5: Generate Markdown (With Mermaid Diagrams)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@coco.function&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CodebaseInfo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;# &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;## Overview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;public_classes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**Classes:**&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cls&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;public_classes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- `&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;`: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;public_functions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**Functions:**&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;public_functions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- `&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;`: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mermaid_graphs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;## Pipeline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;```

mermaid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mermaid_graphs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

```&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 6: Wire It All Together
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@coco.function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_project&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Collection&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;localfs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract info from each file concurrently
&lt;/span&gt;    &lt;span class="n"&gt;file_infos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;extract_file_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Aggregate into project-level summary
&lt;/span&gt;    &lt;span class="n"&gt;project_info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;aggregate_project_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_infos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generate and output markdown
&lt;/span&gt;    &lt;span class="n"&gt;markdown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_info&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;localfs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;declare_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;output_dir&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;create_parent_dirs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cocoindex update main.py
&lt;span class="nb"&gt;ls &lt;/span&gt;output/
&lt;span class="c"&gt;# my_project_1.md  my_project_2.md  ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why This Pattern Is Powerful
&lt;/h2&gt;

&lt;p&gt;This pipeline demonstrates a reusable pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structured LLM outputs&lt;/strong&gt;: LLMs become typed, predictable components via Pydantic + Instructor&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memoized LLM calls&lt;/strong&gt;: You stop paying for the same prompt multiple times&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async concurrency&lt;/strong&gt;: LLM becomes a parallel compute resource&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hierarchical aggregation&lt;/strong&gt;: File → project, page → document, message → conversation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental processing&lt;/strong&gt;: "Live" documentation without nightly rebuilds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anywhere you have "items that need LLM enrichment, plus a rolled-up view," this pattern applies.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Full example: &lt;a href="https://cocoindex.io/examples-v1/multi-codebase-summarization" rel="noopener noreferrer"&gt;Multi-Codebase Summarization&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;CocoIndex repo: &lt;a href="https://github.com/cocoindex/cocoindex" rel="noopener noreferrer"&gt;github.com/cocoindex/cocoindex&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cocoindex.io/examples-v1/" rel="noopener noreferrer"&gt;Step by Step Tutorial&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you end up generating auto-wikis for your own monorepo, drop a link in the comments—I'd love to see what your "self-documenting" codebase looks like.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? Give &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;CocoIndex &lt;/a&gt;a ⭐ on GitHub!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Built a Real-Time HackerNews Trend Radar With AI (And It Runs Itself)</title>
      <dc:creator>Linghua Jin</dc:creator>
      <pubDate>Fri, 16 Jan 2026 03:11:14 +0000</pubDate>
      <link>https://forem.com/badmonster0/i-built-a-real-time-hackernews-trend-radar-with-ai-and-it-runs-itself-1apm</link>
      <guid>https://forem.com/badmonster0/i-built-a-real-time-hackernews-trend-radar-with-ai-and-it-runs-itself-1apm</guid>
      <description>&lt;p&gt;Every day, HackerNews quietly decides what the dev world will care about next.&lt;br&gt;
But unless you're doom-scrolling it all day, you're missing the real signal: &lt;strong&gt;which topics are actually taking off right now&lt;/strong&gt;, across threads and deep comment chains.&lt;/p&gt;

&lt;p&gt;So instead of manually refreshing HN, I built a &lt;strong&gt;real-time "trend radar"&lt;/strong&gt; on top of it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continuously ingests fresh HN stories and comments
&lt;/li&gt;
&lt;li&gt;Uses an LLM to extract structured topics (companies, tools, models, tech terms)
&lt;/li&gt;
&lt;li&gt;Streams everything into Postgres for instant querying like:

&lt;ul&gt;
&lt;li&gt;"What's trending on HN right now?"
&lt;/li&gt;
&lt;li&gt;"Which threads are driving the most hype for Claude / LangChain / Rust today?"
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this runs as a &lt;strong&gt;declarative CocoIndex flow&lt;/strong&gt; with incremental syncs, LLM-powered extraction, and simple query handlers.&lt;/p&gt;

&lt;p&gt;In this post, you'll see how it works end-to-end and how you can fork it to track any community (Reddit, X, Discord, internal Slack, etc.).&lt;/p&gt;


&lt;h2&gt;
  
  
  Why HN Is a Goldmine (If You Can Structure It)
&lt;/h2&gt;

&lt;p&gt;HackerNews is one of the strongest early signals for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;New tools and frameworks devs actually try
&lt;/li&gt;
&lt;li&gt;Which AI models/products are gaining mindshare
&lt;/li&gt;
&lt;li&gt;Real sentiment and feedback in the comments
&lt;/li&gt;
&lt;li&gt;Emerging startups and obscure libraries that might be big in 6-12 months
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But raw HN has three problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Threads are noisy; comments are nested and messy
&lt;/li&gt;
&lt;li&gt;There's no notion of "topics" beyond free text
&lt;/li&gt;
&lt;li&gt;There's no built-in way to ask: &lt;em&gt;"What's trending across the whole firehose?"&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;strong&gt;HackerNews Trending Topics&lt;/strong&gt; example in CocoIndex is essentially: &lt;em&gt;"turn HN into a structured, continuously updating topics index that AI agents and dashboards can query in milliseconds."&lt;/em&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Architecture: From HN Firehose to Queryable Topics
&lt;/h2&gt;

&lt;p&gt;At a high level, the pipeline looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HackerNews API
    ↓
HackerNewsConnector (Custom Source)
    ├─ list() → thread IDs + updated_at
    ├─ get_value() → full threads + comments
    └─ provides_ordinal() → enables incremental sync
    ↓
CocoIndex Flow
    ├─ LLM topic extraction on threads + comments
    ├─ message_index collector (content)
    └─ topic_index collector (topics)
    ↓
Postgres
    ├─ hn_messages
    └─ hn_topics
    ↓
Query Handlers
    ├─ search_by_topic("Claude")
    ├─ get_trending_topics(limit=20)
    └─ get_threads_for_topic("Rust")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key idea: &lt;strong&gt;separate discovery from fetching&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;list()&lt;/code&gt; hits the HN Algolia search API to get lightweight metadata: thread IDs + &lt;code&gt;updated_at&lt;/code&gt; timestamps.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_value()&lt;/code&gt; only runs for threads whose &lt;code&gt;updated_at&lt;/code&gt; changed, fetching full content + comments from the items API.&lt;/li&gt;
&lt;li&gt;Ordinals (timestamps) let CocoIndex skip everything that hasn't changed, cutting API calls by &amp;gt;90% on subsequent syncs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what enables "live mode" with a 30-second polling interval without melting APIs or your wallet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Turning HackerNews Into a First-Class Incremental Source
&lt;/h2&gt;

&lt;p&gt;First, you define the &lt;strong&gt;data model&lt;/strong&gt; for threads and comments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;_HackerNewsThreadKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NamedTuple&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="nd"&gt;@dataclasses.dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;_HackerNewsComment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="nd"&gt;@dataclasses.dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;_HackerNewsThread&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;comments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_HackerNewsComment&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you declare a &lt;strong&gt;SourceSpec&lt;/strong&gt; that configures how to query HN:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;HackerNewsSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SourceSpec&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Source spec for HackerNews API.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;      &lt;span class="c1"&gt;# e.g. "story"
&lt;/span&gt;    &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;      &lt;span class="c1"&gt;# hits per poll
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The custom source connector wires this spec into actual HTTP calls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;list()&lt;/code&gt; → calls &lt;code&gt;https://hn.algolia.com/api/v1/search_by_date&lt;/code&gt; with &lt;code&gt;hitsPerPage=max_results&lt;/code&gt;, yields &lt;code&gt;PartialSourceRow&lt;/code&gt; objects keyed by thread ID, with ordinals based on &lt;code&gt;updated_at&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_value()&lt;/code&gt; → calls &lt;code&gt;https://hn.algolia.com/api/v1/items/{thread_id}&lt;/code&gt; and parses the full thread + nested comments into &lt;code&gt;_HackerNewsThread&lt;/code&gt; and &lt;code&gt;_HackerNewsComment&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;provides_ordinal()&lt;/code&gt; → returns &lt;code&gt;True&lt;/code&gt; so CocoIndex can do incremental sync.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CocoIndex handles the hard part: tracking ordinals and only re-pulling changed rows on each sync.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Using an LLM to Extract Topics From Every Thread and Comment
&lt;/h2&gt;

&lt;p&gt;Once the source is in the flow, the fun part starts: &lt;strong&gt;semantic enrichment&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You define a minimal &lt;code&gt;Topic&lt;/code&gt; type that the LLM will fill:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclasses.dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    A single topic extracted from text:
    - products, tools, frameworks
    - people, companies
    - domains (e.g. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vector search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fintech&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside the flow, every thread gets its topics extracted with a single declarative transform:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;threads&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;row&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ExtractByLlm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;llm_spec&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LlmSpec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;api_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LlmApiType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OPENAI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;output_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Topic&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same for comments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;comments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;row&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;comment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;comment&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;comment&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ExtractByLlm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;llm_spec&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LlmSpec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;api_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LlmApiType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OPENAI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;output_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Topic&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, CocoIndex:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calls the LLM with a structured prompt and enforces &lt;code&gt;output_type=list[Topic]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Normalizes messy free text into consistent topic strings
&lt;/li&gt;
&lt;li&gt;Makes this just another column in your flow instead of a separate glue script
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what turns HN from "some text" into something an AI agent or SQL query can &lt;strong&gt;reason about&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Indexing Into Postgres for Fast Topic Queries
&lt;/h2&gt;

&lt;p&gt;All structured data is collected into two logical indexes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;message_index&lt;/code&gt;: threads + comments with their raw text and metadata
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;topic_index&lt;/code&gt;: individual topics linked back to messages
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Collectors are declared once and then exported to Postgres:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;message_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_collector&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;topic_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_collector&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;message_index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hn_messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Postgres&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;primary_key_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;topic_index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hn_topics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Postgres&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;primary_key_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you have two tables you can poke with SQL or via CocoIndex query handlers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;hn_messages&lt;/code&gt;: full-text search, content analytics, author stats
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;hn_topics&lt;/code&gt;: topic-level analytics, trend tracking, per-topic thread ranking
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 4: Query Handlers - From "Cool Pipeline" to Real Product
&lt;/h2&gt;

&lt;p&gt;Here's where it stops being just a nice ETL project and becomes something you can actually ship.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 &lt;code&gt;search_by_topic(topic)&lt;/code&gt;: "Show Me All Claude Mentions"
&lt;/h3&gt;

&lt;p&gt;This query handler lets you search HN content by topic across threads and comments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@hackernews_trending_topics_flow.query_handler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_by_topic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QueryOutput&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;topic_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_target_default_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;hackernews_trending_topics_flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hn_topics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;message_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_target_default_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;hackernews_trending_topics_flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hn_messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;connection_pool&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
                SELECT m.id, m.thread_id, m.author, m.content_type,
                       m.text, m.created_at, t.topic
                FROM &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic_table&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; t
                JOIN &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message_table&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; m ON t.message_id = m.id
                WHERE LOWER(t.topic) LIKE LOWER(%s)
                ORDER BY m.created_at DESC
                &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://news.ycombinator.com/item?id=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;created_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;QueryOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can literally run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cocoindex query main.py search_by_topic &lt;span class="nt"&gt;--topic&lt;/span&gt; &lt;span class="s2"&gt;"Claude"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;...and get a clean JSON response with URLs, authors, timestamps, and which piece of content the topic appeared in.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 &lt;code&gt;get_threads_for_topic(topic)&lt;/code&gt;: Rank Threads by Topic Score
&lt;/h3&gt;

&lt;p&gt;Not all mentions are equal.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If "Rust" is in the thread title, that's a &lt;strong&gt;primary&lt;/strong&gt; discussion
&lt;/li&gt;
&lt;li&gt;If it's buried in a comment, that's more of a &lt;strong&gt;side mention&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;get_threads_for_topic&lt;/code&gt; uses a weighted scoring model to prioritize threads where the topic is central.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.3 &lt;code&gt;get_trending_topics(limit=20)&lt;/code&gt;: The Actual Trend Radar
&lt;/h3&gt;

&lt;p&gt;Finally, the endpoint that powers dashboards and agents - this surfaces a list like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;["Claude 3.7 Sonnet", "OpenAI o4-mini", "LangChain", "Modal", ...]&lt;/code&gt; with scores and latest mention times
&lt;/li&gt;
&lt;li&gt;Each topic includes the top threads where it's being discussed right now
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can wire this into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A live dashboard showing "top 20 topics in the last N hours"
&lt;/li&gt;
&lt;li&gt;A Slack bot posting a daily "what's trending on HN" summary
&lt;/li&gt;
&lt;li&gt;An internal research agent that watches for signals relevant to your stack
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Running It in Real Time
&lt;/h2&gt;

&lt;p&gt;Once the flow is defined, keeping it live is a one-liner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On-demand refresh&lt;/span&gt;
cocoindex update main

&lt;span class="c"&gt;# Live mode: keeps polling HN and updating indexes&lt;/span&gt;
cocoindex update &lt;span class="nt"&gt;-L&lt;/span&gt; main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CocoIndex handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Polling HN every 30 seconds (configurable)
&lt;/li&gt;
&lt;li&gt;Incrementally syncing only changed threads
&lt;/li&gt;
&lt;li&gt;Re-running LLM extraction only where needed
&lt;/li&gt;
&lt;li&gt;Exporting into Postgres and making query handlers available
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For debugging, &lt;strong&gt;CocoInsight&lt;/strong&gt; lets you explore the flow, see lineage, and play with queries from a UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cocoindex server &lt;span class="nt"&gt;-ci&lt;/span&gt; main
&lt;span class="c"&gt;# Then open: https://cocoindex.io/cocoinsight&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What You Can Build on Top of This (Beyond "Just HN")
&lt;/h2&gt;

&lt;p&gt;Once you have this pattern, you're not limited to HackerNews.&lt;/p&gt;

&lt;p&gt;Some obvious extensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cross-community trend tracking&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add Reddit subs, X lists, Discord channels, internal Slack, etc. as additional sources
&lt;/li&gt;
&lt;li&gt;Normalize topics across them to see which ideas propagate where and when
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Sentiment-aware trend analysis&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Plug in an LLM-based sentiment extraction step alongside topics
&lt;/li&gt;
&lt;li&gt;Track not just &lt;em&gt;what&lt;/em&gt; is trending, but &lt;em&gt;whether devs love or hate it&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Influencer and key-contributor maps&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the &lt;code&gt;author&lt;/code&gt; field to see who starts important discussions and whose comments move the conversation&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Continuous knowledge graphs&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Treat topics as nodes, threads as edges, and build a graph of tools, companies, and people linked by real discussions&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Real-time AI research agents&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Point an agent at the Postgres-backed index and let it answer questions like
&lt;/li&gt;
&lt;li&gt;"What are the top new vector DBs people are experimenting with this week?"
&lt;/li&gt;
&lt;li&gt;"Which AI eval frameworks are getting traction?"&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;If you live in data, infra, or AI-land, this is basically &lt;strong&gt;a self-updating signal layer over HN that your tools and agents can query.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Want to Try It Yourself?
&lt;/h2&gt;

&lt;p&gt;You can find the fully working example (including flow definition, custom source, query handlers, and Postgres export) in the official &lt;a href="https://cocoindex.io/examples/hackernews-trending-topics" rel="noopener noreferrer"&gt;HackerNews Trending Topics example&lt;/a&gt; on the CocoIndex docs and &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you end up:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pointing this at a different community
&lt;/li&gt;
&lt;li&gt;Layering in embeddings, RAG, or sentiment
&lt;/li&gt;
&lt;li&gt;Wiring it into a real product or agent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...definitely share it back. The coolest part of this pattern is how little code you need to go from "raw community noise" to a &lt;strong&gt;live, queryable trend radar&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Build an AI-Powered Competitive Intelligence Monitor</title>
      <dc:creator>Linghua Jin</dc:creator>
      <pubDate>Thu, 15 Jan 2026 07:21:09 +0000</pubDate>
      <link>https://forem.com/badmonster0/build-an-ai-powered-competitive-intelligence-monitor-740</link>
      <guid>https://forem.com/badmonster0/build-an-ai-powered-competitive-intelligence-monitor-740</guid>
      <description>&lt;p&gt;Staying ahead of competitors requires constant vigilance—tracking product launches, funding rounds, partnerships, and strategic moves across the web. The open-source &lt;strong&gt;Competitive Intelligence Monitor&lt;/strong&gt; project demonstrates how to automate this process using CocoIndex, Tavily Search, and LLM extraction to continuously track and structure competitor news into a queryable PostgreSQL database. &lt;/p&gt;

&lt;h2&gt;
  
  
  What the Project Does
&lt;/h2&gt;

&lt;p&gt;The system automates web monitoring by using Tavily's AI-native search to pull full-text articles, then feeding them through a GPT-4o-mini–based extraction layer to detect structured "competitive events" such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product launches and feature releases&lt;/li&gt;
&lt;li&gt;Partnerships and collaborations
&lt;/li&gt;
&lt;li&gt;Funding rounds and financial news&lt;/li&gt;
&lt;li&gt;Key executive hires/departures&lt;/li&gt;
&lt;li&gt;Acquisitions and mergers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These events and their source articles are stored in PostgreSQL so teams can ask natural questions like "What has Anthropic been doing recently?" or "Which competitors are making the most news this week?"&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Tavily AI   │────▶│  CocoIndex   │────▶│  PostgreSQL  │
│    Search    │     │   Pipeline   │     │   Database   │
└──────────────┘     └──────────────┘     └──────────────┘
       │                    │                    │
       ▼                    ▼                    ▼
   Articles           Extraction           Intelligence
  (web data)        (GPT-4o-mini)         (structured)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Data flows from Tavily search results into an LLM extraction step that produces &lt;code&gt;CompetitiveEvent&lt;/code&gt; objects, then into dual indexes—one table for raw articles and another for normalized events.&lt;/p&gt;
&lt;h2&gt;
  
  
  Data Model: The CompetitiveEvent Class
&lt;/h2&gt;

&lt;p&gt;At the heart of the extraction is the &lt;code&gt;CompetitiveEvent&lt;/code&gt; dataclass that defines what the LLM should extract from each article:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclasses.dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CompetitiveEvent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A competitive intelligence event extracted from text.

    Examples:
    - Product Launch: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenAI released GPT-5 with multimodal capabilities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    - Partnership: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Anthropic partnered with Google Cloud for enterprise AI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    - Funding: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mistral AI raised $400M Series B led by Andreessen Horowitz&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    - Key Hire: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Former Meta AI director joined Cohere as Chief Scientist&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    - Strategic Move: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Microsoft acquired AI startup Inflection for $650M&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;      &lt;span class="c1"&gt;# "product_launch", "partnership", "funding", "key_hire", "acquisition", "other"
&lt;/span&gt;    &lt;span class="n"&gt;competitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;      &lt;span class="c1"&gt;# Company name (e.g., "OpenAI", "Anthropic", "Google AI")
&lt;/span&gt;    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;     &lt;span class="c1"&gt;# Brief description of the event
&lt;/span&gt;    &lt;span class="n"&gt;significance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;    &lt;span class="c1"&gt;# "high", "medium", "low" - based on market impact
&lt;/span&gt;    &lt;span class="n"&gt;related_companies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Other companies mentioned
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Custom Tavily Source Connector
&lt;/h2&gt;

&lt;p&gt;The project implements a custom CocoIndex source connector that interfaces with Tavily's AI-native search API:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TavilySearchSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SourceSpec&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetches competitive intelligence using Tavily AI Search API.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;competitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;days_back&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;
    &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;

&lt;span class="nd"&gt;@source_connector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;spec_cls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TavilySearchSource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;key_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_ArticleKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;value_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_Article&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TavilySearchConnector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AsyncIterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;PartialSourceRow&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_ArticleKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_Article&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;List articles from Tavily search.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;search_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;competitor&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; AND &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(funding OR partnership OR product launch OR acquisition OR executive hire)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TavilyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;search_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;search_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advanced&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;include_raw_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
            &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nc"&gt;PartialSourceRow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;_ArticleKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;PartialSourceRowData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ordinal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ordinal&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  The CocoIndex Pipeline Definition
&lt;/h2&gt;

&lt;p&gt;The main pipeline uses CocoIndex's flow builder to orchestrate data collection and LLM extraction:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@cocoindex.flow_def&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CompetitiveIntelligence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;competitive_intelligence_flow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;flow_builder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FlowBuilder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataScope&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Main pipeline for competitive intelligence monitoring.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;competitors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COMPETITORS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenAI,Anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;refresh_interval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REFRESH_INTERVAL_SECONDS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3600&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Add Tavily search source for each competitor
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;competitor&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;competitors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;articles_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;competitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flow_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;TavilySearchSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;competitor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;competitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="n"&gt;days_back&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;search_days_back&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;refresh_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;refresh_interval&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;articles_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_collector&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;events_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_collector&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Process each competitor's articles
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;competitor&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;competitors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;articles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;articles_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;competitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;row&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Extract competitive events using GPT-4o-mini via OpenRouter
&lt;/span&gt;            &lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;events&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;article&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ExtractByLlm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;llm_spec&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LlmSpec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;api_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LlmApiType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OPENAI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://openrouter.ai/api/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;output_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;CompetitiveEvent&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract competitive intelligence events from this article. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Focus on: product launches, partnerships, funding rounds, key hires, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;acquisitions, and other strategic moves.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Query Handlers for Analysis
&lt;/h2&gt;

&lt;p&gt;The project includes built-in query handlers that enable SQL-powered intelligence retrieval:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@competitive_intelligence_flow.query_handler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_by_competitor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;competitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QueryOutput&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Find recent competitive intelligence about a specific competitor.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;connection_pool&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
                SELECT e.competitor, e.event_type, e.description, e.significance,
                       e.related_companies, a.title, a.url, a.source, a.published_at
                FROM &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;events_table&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; e
                JOIN &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;articles_table&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; a ON e.article_id = a.id
                WHERE LOWER(e.competitor) LIKE LOWER(%s)
            &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;competitor&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; AND e.event_type = %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; ORDER BY a.published_at DESC LIMIT %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;QueryOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Configuration is controlled through environment variables:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgresql://user:password@localhost:5432/competitive_intel
&lt;span class="nv"&gt;COCOINDEX_DATABASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgresql://user:password@localhost:5432/competitive_intel
&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-or-v1-...
&lt;span class="nv"&gt;TAVILY_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;tvly-...
&lt;span class="nv"&gt;COMPETITORS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;OpenAI,Anthropic,Google AI,Meta AI,Mistral AI
&lt;span class="nv"&gt;REFRESH_INTERVAL_SECONDS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3600
&lt;span class="nv"&gt;SEARCH_DAYS_BACK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Run the interactive CLI for first-time setup:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 run_interactive.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Or use CocoIndex directly for automated deployments:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cocoindex update main &lt;span class="nt"&gt;-f&lt;/span&gt;          &lt;span class="c"&gt;# Initial sync&lt;/span&gt;
cocoindex update &lt;span class="nt"&gt;-L&lt;/span&gt; main.py       &lt;span class="c"&gt;# Continuous monitoring&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Why This Approach Matters
&lt;/h2&gt;

&lt;p&gt;By combining AI-native search with structured LLM extraction, the monitor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Avoids brittle scraping&lt;/strong&gt; - Tavily handles content extraction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;De-duplicates work&lt;/strong&gt; - CocoIndex tracks processed articles via incremental processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Turns noise into signal&lt;/strong&gt; - Structured events with significance scoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enables flexible analysis&lt;/strong&gt; - Dual indexing (raw + extracted) for maximum flexibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project supports multiple query types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search by competitor name&lt;/li&gt;
&lt;li&gt;Filter by event type (funding, partnerships, acquisitions, etc.)&lt;/li&gt;
&lt;li&gt;Rank by significance (high=3, medium=2, low=1 weighted scoring)&lt;/li&gt;
&lt;li&gt;Trend analysis across time periods&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Get the Code
&lt;/h2&gt;

&lt;p&gt;The project is MIT-licensed and available on GitHub:&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/Laksh-star" rel="noopener noreferrer"&gt;
        Laksh-star
      &lt;/a&gt; / &lt;a href="https://github.com/Laksh-star/competitive-intelligence" rel="noopener noreferrer"&gt;
        competitive-intelligence
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      AI-powered competitive intelligence monitor using CocoIndex, Tavily Search, and LLM extraction
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Competitive Intelligence Monitor&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a href="https://www.python.org/downloads/" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/36cf3d0f7992a33a063d3833577d62204f8934d82b69874c086390608db4947c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f707974686f6e2d332e31312b2d626c75652e737667" alt="Python 3.11+"&gt;&lt;/a&gt;
&lt;a href="https://cocoindex.io" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/4226ab38e128ef6b34aa56a15fb25ba6a221b177ba475aa3fe5ef011ba67c0ab/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f436f636f496e6465782d302e332e392b2d677265656e2e737667" alt="CocoIndex"&gt;&lt;/a&gt;
&lt;a href="https://opensource.org/licenses/MIT" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/fdf2982b9f5d7489dcf44570e714e3a15fce6253e0cc6b5aa61a075aac2ff71b/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d79656c6c6f772e737667" alt="License: MIT"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Track competitor mentions across the web using AI-powered search and LLM extraction. Automatically monitors competitors, extracts competitive intelligence events, and stores structured data in PostgreSQL for analysis.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;What This Does&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;This pipeline automatically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Searches&lt;/strong&gt; the web using Tavily AI (AI-native search engine optimized for agents)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extracts&lt;/strong&gt; competitive intelligence events using DeepSeek LLM analysis
&lt;ul&gt;
&lt;li&gt;Product launches and feature releases&lt;/li&gt;
&lt;li&gt;Partnerships and collaborations&lt;/li&gt;
&lt;li&gt;Funding rounds and financial news&lt;/li&gt;
&lt;li&gt;Key executive hires/departures&lt;/li&gt;
&lt;li&gt;Acquisitions and mergers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indexes&lt;/strong&gt; both raw articles and extracted events in PostgreSQL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enables queries&lt;/strong&gt; like:
&lt;ul&gt;
&lt;li&gt;"What has OpenAI been doing recently?"&lt;/li&gt;
&lt;li&gt;"Which competitors are making the most news?"&lt;/li&gt;
&lt;li&gt;"Find all partnership announcements"&lt;/li&gt;
&lt;li&gt;"What are the most significant competitive moves this week?"&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Prerequisites&lt;/h2&gt;

&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL Database&lt;/strong&gt; - Choose one option
&lt;ul&gt;
&lt;li&gt;Local PostgreSQL installation&lt;/li&gt;
&lt;li&gt;Cloud PostgreSQL (AWS RDS, Google Cloud SQL, Azure Database, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.11+&lt;/strong&gt; - Required for CocoIndex&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Keys&lt;/strong&gt; (required)
&lt;ul&gt;
&lt;li&gt;Tavily API key from &lt;a href="https://tavily.com" rel="nofollow noopener noreferrer"&gt;tavily.com&lt;/a&gt; (free tier: 1,000…&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/Laksh-star/competitive-intelligence" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;







&lt;p&gt;&lt;strong&gt;Built with:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cocoindex.io/" rel="noopener noreferrer"&gt;CocoIndex&lt;/a&gt; - Modern data pipeline framework&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://tavily.com/" rel="noopener noreferrer"&gt;Tavily AI Search&lt;/a&gt; - AI-native search engine&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://openrouter.ai/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; - Multi-model API gateway&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Have questions or want to contribute? Drop a comment below or open an issue on GitHub!&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>opensource</category>
      <category>cocoindex</category>
    </item>
    <item>
      <title>Stop Writing Fragile Prompts: Extract Structured Data from PDFs with DSPy + CocoIndex</title>
      <dc:creator>Linghua Jin</dc:creator>
      <pubDate>Tue, 13 Jan 2026 19:13:35 +0000</pubDate>
      <link>https://forem.com/badmonster0/stop-writing-fragile-prompts-extract-structured-data-from-pdfs-with-dspy-cocoindex-10ln</link>
      <guid>https://forem.com/badmonster0/stop-writing-fragile-prompts-extract-structured-data-from-pdfs-with-dspy-cocoindex-10ln</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Traditional prompt engineering is fragile—small changes break everything. This tutorial shows how to extract structured patient data from PDF intake forms using DSPy's typed Signatures + CocoIndex's incremental processing. No OCR preprocessing, no regex, just declarative code.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem with Prompt Engineering
&lt;/h2&gt;

&lt;p&gt;If you've built LLM applications, you know the pain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You write a carefully crafted prompt with instructions and few-shot examples&lt;/li&gt;
&lt;li&gt;It works... until the model changes, or the data shifts slightly&lt;/li&gt;
&lt;li&gt;Your output format breaks, and you're back to debugging strings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Logic buried in strings is hard to test, compose, or version.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What if there was a better way?&lt;/p&gt;




&lt;h2&gt;
  
  
  Enter DSPy + CocoIndex
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/stanfordnlp/dspy" rel="noopener noreferrer"&gt;DSPy&lt;/a&gt; (from Stanford) replaces prompt engineering with a &lt;em&gt;programming model&lt;/em&gt;. You define &lt;strong&gt;what&lt;/strong&gt; each LLM step should do (inputs, outputs, constraints), and the framework figures out &lt;strong&gt;how&lt;/strong&gt; to prompt the model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;CocoIndex&lt;/a&gt; is an ultra-performant data processing engine (Rust-powered) that handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File ingestion from any source&lt;/li&gt;
&lt;li&gt;Incremental processing (only reprocess changed documents)&lt;/li&gt;
&lt;li&gt;Caching and lineage tracking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, they form a powerful combo: &lt;strong&gt;DSPy owns "how the model thinks," CocoIndex owns "how data moves and stays fresh."&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What We're Building
&lt;/h2&gt;

&lt;p&gt;A pipeline that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reads PDF patient intake forms&lt;/li&gt;
&lt;li&gt;Converts pages to images&lt;/li&gt;
&lt;li&gt;Extracts structured &lt;code&gt;Patient&lt;/code&gt; data using vision models&lt;/li&gt;
&lt;li&gt;Exports to PostgreSQL with automatic updates&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Step 1: Define Your Schema with Pydantic
&lt;/h2&gt;

&lt;p&gt;Instead of parsing unstructured text, we define exactly what we want:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Address&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;street&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;zip_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Insurance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;policy_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;group_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;policyholder_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Patient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;dob&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;
    &lt;span class="n"&gt;gender&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Address&lt;/span&gt;
    &lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;insurance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Insurance&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;reason_for_visit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;allergies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;current_medications&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;consent_given&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This FHIR-inspired schema gives us &lt;strong&gt;validation, nested models, and type safety&lt;/strong&gt; out of the box.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Create the DSPy Signature
&lt;/h2&gt;

&lt;p&gt;A Signature is DSPy's way of declaring the task contract:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PatientExtractionSignature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Signature&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract structured patient information from a medical intake form image.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;form_images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;InputField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Images of the patient intake form pages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;patient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Patient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OutputField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extracted patient information with all available fields filled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice: &lt;strong&gt;No prompts, no examples, no parsing logic.&lt;/strong&gt; Just a typed contract.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Build the Extractor Module
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PatientExtractor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extract&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ChainOfThought&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PatientExtractionSignature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;form_images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Patient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;form_images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;form_images&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;patient&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ChainOfThought&lt;/code&gt; handles the reasoning—DSPy translates your signature into an effective prompt automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Wire It Into CocoIndex
&lt;/h2&gt;

&lt;p&gt;Here's where incremental processing magic happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pymupdf&lt;/span&gt;

&lt;span class="nd"&gt;@cocoindex.op.function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;behavior_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_patient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Patient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Convert PDF to images
&lt;/span&gt;    &lt;span class="n"&gt;pdf_doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pymupdf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pdf_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filetype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;form_images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pdf_doc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;pix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_pixmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matrix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pymupdf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# 2x resolution
&lt;/span&gt;        &lt;span class="n"&gt;form_images&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tobytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="n"&gt;pdf_doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Extract with DSPy
&lt;/span&gt;    &lt;span class="n"&gt;extractor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PatientExtractor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;extractor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;form_images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;form_images&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;cache=True&lt;/code&gt; decorator means &lt;strong&gt;repeated calls with the same PDF reuse results&lt;/strong&gt;—no wasted API calls.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: Define the Flow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@cocoindex.flow_def&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PatientIntakeExtraction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;patient_intake_flow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flow_builder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Source: local PDF files
&lt;/span&gt;    &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flow_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LocalFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data/patient_forms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;binary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;patients_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_collector&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;row&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patient_info&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extract_patient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;patients_index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;patient_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patient_info&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Export to Postgres
&lt;/span&gt;    &lt;span class="n"&gt;patients_index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patients&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;storages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Postgres&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patients_info&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;primary_key_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why This Approach Wins
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Traditional Approach&lt;/th&gt;
&lt;th&gt;DSPy + CocoIndex&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fragile prompts&lt;/td&gt;
&lt;td&gt;Typed Signatures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manual parsing&lt;/td&gt;
&lt;td&gt;Automatic structured output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full reprocessing&lt;/td&gt;
&lt;td&gt;Incremental updates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No audit trail&lt;/td&gt;
&lt;td&gt;Built-in lineage tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;String debugging&lt;/td&gt;
&lt;td&gt;Testable modules&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Run It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;cocoindex dspy-ai pydantic pymupdf
cocoindex update main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Your PDFs are now structured data in Postgres, automatically kept in sync.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;DSPy replaces prompt engineering&lt;/strong&gt; with a programming model—define the contract, not the implementation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vision models eliminate OCR&lt;/strong&gt; complexity—just pass images directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CocoIndex handles the plumbing&lt;/strong&gt;—caching, incremental updates, lineage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pydantic gives you validation&lt;/strong&gt; and nested structures for free&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Neither tool tries to be the whole stack&lt;/strong&gt;—they compose beautifully&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;📖 &lt;a href="https://cocoindex.io/examples/patient_form_extraction_dspy" rel="noopener noreferrer"&gt;Full Tutorial with Code&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;⭐ &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;CocoIndex on GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🧠 &lt;a href="https://dspy-docs.vercel.app/" rel="noopener noreferrer"&gt;DSPy Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Have you tried DSPy or CocoIndex? Drop a comment with your experience—I'd love to hear how others are solving structured extraction problems!&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>datascience</category>
    </item>
    <item>
      <title>We Built an Open-Source Pipeline That Turned Meeting Notes Into a Live Knowledge Graph — And It Went Viral (200K Impressions)</title>
      <dc:creator>Linghua Jin</dc:creator>
      <pubDate>Mon, 05 Jan 2026 03:22:44 +0000</pubDate>
      <link>https://forem.com/badmonster0/we-built-an-open-source-pipeline-that-turned-meeting-notes-into-a-live-knowledge-graph-and-it-4450</link>
      <guid>https://forem.com/badmonster0/we-built-an-open-source-pipeline-that-turned-meeting-notes-into-a-live-knowledge-graph-and-it-4450</guid>
      <description>&lt;h2&gt;
  
  
  🚀 The Result: 200K Social Impressions and Viral Engagement
&lt;/h2&gt;

&lt;p&gt;Our latest project just exploded on LinkedIn with 200K+ impressions, and for good reason. We built something that solves a real problem most companies face: &lt;strong&gt;their meeting notes are a goldmine of untapped knowledge&lt;/strong&gt;, but nobody has time to manually organize them.&lt;/p&gt;

&lt;h2&gt;
  
  
  💡 The Problem
&lt;/h2&gt;

&lt;p&gt;Most companies sit on an ocean of meeting notes scattered across Google Drive. Inside those documents are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Critical decisions that shape product direction&lt;/li&gt;
&lt;li&gt;Action items and task assignments&lt;/li&gt;
&lt;li&gt;Key relationships between people, projects, and initiatives&lt;/li&gt;
&lt;li&gt;Institutional knowledge that disappears when people leave&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here's the catch: &lt;strong&gt;these documents are constantly changing&lt;/strong&gt;. Traditional data pipelines would reprocess everything from scratch every time, wasting compute and money on unchanged files.&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚡ The Solution: Incremental Processing with CocoIndex
&lt;/h2&gt;

&lt;p&gt;We built an open-source pipeline that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Connects directly to Google Drive&lt;/strong&gt; - no manual exports needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Only processes what changed&lt;/strong&gt; - incremental LLM extraction means zero reprocessing of unchanged docs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Builds a live knowledge graph&lt;/strong&gt; - automatically extracts entities, relationships, and updates Neo4j in real-time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production-ready&lt;/strong&gt; - fully open-sourced under Apache 2.0 license&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  🔧 How It Works
&lt;/h2&gt;

&lt;p&gt;The pipeline continuously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitors your Google Drive for changes&lt;/li&gt;
&lt;li&gt;Uses LLM to extract structured data (people, decisions, tasks, relationships)&lt;/li&gt;
&lt;li&gt;Incrementally updates the knowledge graph - only changed documents get reprocessed&lt;/li&gt;
&lt;li&gt;Serves fresh insights through Neo4j queries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tech Stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;CocoIndex&lt;/a&gt; - for incremental data processing&lt;/li&gt;
&lt;li&gt;Neo4j - for the knowledge graph&lt;/li&gt;
&lt;li&gt;LLM - for entity and relationship extraction&lt;/li&gt;
&lt;li&gt;Google Drive API - for document access&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📚 Full Tutorial Available
&lt;/h2&gt;

&lt;p&gt;We've published a complete step-by-step tutorial with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full source code (Apache 2.0)&lt;/li&gt;
&lt;li&gt;Architecture explanations&lt;/li&gt;
&lt;li&gt;Setup instructions&lt;/li&gt;
&lt;li&gt;Real examples&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🔗 Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub Repo: &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;https://github.com/cocoindex-io/cocoindex&lt;/a&gt; (⭐ star it if you find it useful!)&lt;/li&gt;
&lt;li&gt;Tutorial: &lt;a href="https://cocoindex.io/blogs/meeting-notes-graph" rel="noopener noreferrer"&gt;https://cocoindex.io/blogs/meeting-notes-graph&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Live Example: &lt;a href="https://cocoindex.io/examples/meeting_notes_graph" rel="noopener noreferrer"&gt;https://cocoindex.io/examples/meeting_notes_graph&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🎯 Why This Matters
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Incremental processing&lt;/strong&gt; is the key differentiator here. Most pipelines are "dumb" - they reprocess everything even if only one document changed. That's:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ Expensive (LLM costs add up fast)&lt;/li&gt;
&lt;li&gt;❌ Slow (unnecessary compute time)&lt;/li&gt;
&lt;li&gt;❌ Wasteful (environmental impact)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With CocoIndex's incremental approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Only pay for what changed&lt;/li&gt;
&lt;li&gt;✅ Real-time updates&lt;/li&gt;
&lt;li&gt;✅ Scales with your document library&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🌟 Built With Open Source
&lt;/h2&gt;

&lt;p&gt;This entire project is open source and production-ready. Whether you're a startup drowning in meeting notes or an enterprise looking to unlock institutional knowledge, you can deploy this today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What could you build with a live knowledge graph of your company's meeting notes?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Drop your thoughts in the comments! And if you're working on similar problems, I'd love to hear about your approach. 👇&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>knowledgegraph</category>
      <category>llm</category>
      <category>python</category>
    </item>
    <item>
      <title>Why Your AI Agent is Living in the Past (And How to Fix It) 🚀</title>
      <dc:creator>Linghua Jin</dc:creator>
      <pubDate>Wed, 24 Dec 2025 00:22:35 +0000</pubDate>
      <link>https://forem.com/badmonster0/why-your-ai-agent-is-living-in-the-past-and-how-to-fix-it-563p</link>
      <guid>https://forem.com/badmonster0/why-your-ai-agent-is-living-in-the-past-and-how-to-fix-it-563p</guid>
      <description>&lt;h2&gt;
  
  
  The Stale Context Problem
&lt;/h2&gt;

&lt;p&gt;Imagine this: You've built a beautiful AI agent that can answer questions about your codebase. You spent weeks setting up the perfect data pipeline, carefully chunked your documents, and embedded everything into a vector database. &lt;/p&gt;

&lt;p&gt;Then someone pushes a new feature to main.&lt;/p&gt;

&lt;p&gt;Your AI agent? &lt;strong&gt;Still answering questions based on yesterday's code.&lt;/strong&gt; 😬&lt;/p&gt;

&lt;p&gt;This is the dirty secret of production AI systems: &lt;strong&gt;maintaining fresh, structured context is harder than building the AI itself.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Fresh Context Matters for AI Agents
&lt;/h2&gt;

&lt;p&gt;Here's the reality: AI agents in 2025 aren't just answering static FAQs. They're:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring live codebases&lt;/strong&gt; that change dozens of times per day&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Processing incoming emails&lt;/strong&gt; and turning them into structured data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analyzing meeting notes&lt;/strong&gt; to build dynamic knowledge graphs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watching PDF documents&lt;/strong&gt; that get updated in real-time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tracking customer data&lt;/strong&gt; that evolves every second&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every time your source data changes, you face a painful choice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Re-index everything&lt;/strong&gt; (slow, expensive, wastes compute)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let your context go stale&lt;/strong&gt; (fast way to lose user trust)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build complex change tracking&lt;/strong&gt; (hello technical debt!)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There has to be a better way. Spoiler: there is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enter CocoIndex: Context Engineering Made Simple
&lt;/h2&gt;

&lt;p&gt;CocoIndex just hit #1 on GitHub Trending for Rust, and for good reason. It's a data transformation framework built specifically for keeping AI context fresh.&lt;/p&gt;

&lt;p&gt;Here's what makes it different:&lt;/p&gt;

&lt;h3&gt;
  
  
  🚀 Incremental Processing by Default
&lt;/h3&gt;

&lt;p&gt;No more re-processing your entire dataset when one file changes. CocoIndex tracks dependencies and only recomputes what's necessary.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This automatically handles incremental updates
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flow_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LocalFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you update a single document, CocoIndex:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Detects exactly what changed&lt;/li&gt;
&lt;li&gt;Re-processes only affected chunks&lt;/li&gt;
&lt;li&gt;Updates your vector store with minimal operations&lt;/li&gt;
&lt;li&gt;Preserves everything else&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No index swaps. No downtime. No stale data.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧱 Dataflow Programming Model
&lt;/h3&gt;

&lt;p&gt;Define your transformations once, and CocoIndex handles the orchestration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@cocoindex.flow_def&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SmartContext&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;smart_context_flow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flow_builder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Source: Read from anywhere
&lt;/span&gt;    &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flow_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LocalFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown_files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;collector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_collector&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Transform: Process each document
&lt;/span&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;row&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Split into chunks
&lt;/span&gt;        &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SplitRecursively&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Embed each chunk
&lt;/span&gt;        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;row&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SentenceTransformerEmbed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;collector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Export: Send to your vector store
&lt;/span&gt;    &lt;span class="n"&gt;collector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Postgres&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;vector_indexes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what you DON'T see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No explicit update logic&lt;/li&gt;
&lt;li&gt;No manual cache invalidation
&lt;/li&gt;
&lt;li&gt;No index swap coordination&lt;/li&gt;
&lt;li&gt;No "when to re-embed" decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just pure transformation logic. CocoIndex handles the rest.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔧 Built for Production, Not Demos
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Ultra-performant Rust core&lt;/strong&gt;: The heavy lifting happens in Rust, giving you C-level performance with Python ergonomics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data lineage out of the box&lt;/strong&gt;: Track exactly where each piece of context came from. Debug your AI's reasoning, not just its output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plug-and-play components&lt;/strong&gt;: Switch between embedding models, vector stores, or data sources with single-line changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;p&gt;Here's what developers are building:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live Code Search&lt;/strong&gt;: Index your entire monorepo, keep embeddings fresh as PRs merge. No more "this was refactored last week" moments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Meeting Notes → Knowledge Graph&lt;/strong&gt;: Extract entities and relationships from Google Drive meeting notes, automatically update your knowledge base.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Smart PDF Processing&lt;/strong&gt;: Parse complex PDFs (text + images), embed both modalities, and serve multimodal search that stays current.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Customer Context for Support AI&lt;/strong&gt;: Keep your support agent's context synchronized with live customer data, product updates, and recent tickets.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Context Engineering Paradigm Shift
&lt;/h2&gt;

&lt;p&gt;Traditional RAG: "Let's embed everything and query it"&lt;br&gt;
Context Engineering: "Let's define transformations and keep everything synchronized"&lt;/p&gt;

&lt;p&gt;The difference? Production AI systems that actually work at scale.&lt;/p&gt;
&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;CocoIndex is open source (Apache 2.0) and dead simple to get started:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; cocoindex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check out the examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text embedding with auto-updates&lt;/li&gt;
&lt;li&gt;PDF processing with live refresh&lt;/li&gt;
&lt;li&gt;Knowledge graph extraction&lt;/li&gt;
&lt;li&gt;Custom transformations&lt;/li&gt;
&lt;li&gt;Multi-format indexing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 GitHub: &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;github.com/cocoindex-io/cocoindex&lt;/a&gt;&lt;br&gt;
📖 Docs: &lt;a href="https://cocoindex.io/docs" rel="noopener noreferrer"&gt;cocoindex.io/docs&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;2026 is the year autonomous agents go mainstream. But they won't succeed with stale context.&lt;/p&gt;

&lt;p&gt;If you're building AI systems that need to stay synchronized with reality — not just answer questions about the past — context engineering is your unlock.&lt;/p&gt;

&lt;p&gt;And CocoIndex? It's the framework that makes it actually feasible.&lt;/p&gt;

&lt;p&gt;Give it a star if you're tired of rebuilding indexes manually ⭐&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your biggest challenge keeping AI context fresh? Drop a comment below! 👇&lt;/em&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>productivity</category>
      <category>beginners</category>
      <category>rust</category>
    </item>
    <item>
      <title>Build a Real-Time Codebase Index in 5 Minutes with CocoIndex (Rust + Tree-sitter)</title>
      <dc:creator>Linghua Jin</dc:creator>
      <pubDate>Mon, 15 Dec 2025 01:41:56 +0000</pubDate>
      <link>https://forem.com/badmonster0/build-a-real-time-codebase-index-in-5-minutes-with-cocoindex-rust-tree-sitter-eo3</link>
      <guid>https://forem.com/badmonster0/build-a-real-time-codebase-index-in-5-minutes-with-cocoindex-rust-tree-sitter-eo3</guid>
      <description>&lt;h2&gt;
  
  
  Why Another Codebase Indexing Tool?
&lt;/h2&gt;

&lt;p&gt;Let's be honest: managing code context for AI agents is a nightmare. &lt;/p&gt;

&lt;p&gt;Your AI coding assistant needs to understand your entire codebase—not just one file at a time. Whether you're building RAG systems for Claude, context for Cursor, or semantic code search, you need:&lt;/p&gt;

&lt;p&gt;✅ Fast, incremental updates (not rebuilding everything)&lt;br&gt;
✅ Proper code parsing (not just text chunking)&lt;br&gt;
✅ Vector embeddings for semantic search&lt;br&gt;
✅ Real-time sync when your code changes&lt;/p&gt;

&lt;p&gt;That's exactly what &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;CocoIndex&lt;/a&gt; delivers.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Makes CocoIndex Special?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Built-in Tree-sitter Support&lt;/strong&gt;: Unlike generic text splitters, CocoIndex uses Tree-sitter to parse your code semantically. It understands functions, classes, and code structure—not just lines of text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Incremental Processing&lt;/strong&gt;: Only reprocess what changed. No more waiting 10 minutes every time you update a single file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Native Vector Search&lt;/strong&gt;: Built-in support for embedding generation and vector search with PostgreSQL + pgvector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP Compatible&lt;/strong&gt;: Works seamlessly with AI editors like Cursor, Windsurf, and Claude.&lt;/p&gt;
&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;p&gt;🤖 &lt;strong&gt;AI Coding Agents&lt;/strong&gt;: Give Claude, Codex, or Gemini the right code context&lt;/p&gt;

&lt;p&gt;🔍 &lt;strong&gt;Semantic Code Search&lt;/strong&gt;: Find code by meaning, not keywords&lt;/p&gt;

&lt;p&gt;📝 &lt;strong&gt;Auto Documentation&lt;/strong&gt;: Keep design docs synced with actual code&lt;/p&gt;

&lt;p&gt;🔧 &lt;strong&gt;Code Review Automation&lt;/strong&gt;: AI-powered PR analysis&lt;/p&gt;

&lt;p&gt;🚨 &lt;strong&gt;SRE Workflows&lt;/strong&gt;: Index infrastructure-as-code for incident response&lt;/p&gt;
&lt;h2&gt;
  
  
  Tutorial: Build Your Codebase Index
&lt;/h2&gt;

&lt;p&gt;Let me show you how ridiculously simple this is.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Install
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; cocoindex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;You'll also need PostgreSQL with pgvector extension. &lt;a href="https://cocoindex.io/docs/getting_started/installation" rel="noopener noreferrer"&gt;Installation guide here&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Define Your Flow
&lt;/h3&gt;

&lt;p&gt;Create a flow that reads your codebase, chunks it with Tree-sitter, and generates embeddings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;

&lt;span class="nd"&gt;@cocoindex.flow_def&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CodeEmbedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;code_embedding_flow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;flow_builder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FlowBuilder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataScope&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Load your codebase
&lt;/span&gt;    &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flow_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LocalFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;..&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;..&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;included_patterns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.rs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.toml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;excluded_patterns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**/node_modules&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;code_embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_collector&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Extract Language &amp;amp; Chunk Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@cocoindex.op.function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_extension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;splitext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;row&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract extension for Tree-sitter
&lt;/span&gt;    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extension&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extract_extension&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Chunk code semantically
&lt;/span&gt;    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SplitRecursively&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extension&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Embed &amp;amp; Index
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@cocoindex.transform_flow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;code_to_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataSlice&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataSlice&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SentenceTransformerEmbed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentence-transformers/all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;row&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code_to_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;code_embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Export to PostgreSQL with vector index
&lt;/span&gt;&lt;span class="n"&gt;code_embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;storages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Postgres&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;primary_key_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;vector_indexes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;VectorIndex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;VectorSimilarityMetric&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COSINE_SIMILARITY&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Run It
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cocoindex update main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Boom. Your codebase is now indexed with semantic embeddings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Query Your Index
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ConnectionPool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_target_storage_default_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;code_embedding_flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;query_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;code_to_embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
                SELECT filename, code, embedding &amp;lt;=&amp;gt; %s::vector AS distance
                FROM &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
                ORDER BY distance LIMIT %s
            &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can search your codebase semantically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python main.py
&lt;span class="c"&gt;# Enter: "authentication middleware"&lt;/span&gt;
&lt;span class="c"&gt;# Returns relevant auth code across your entire codebase&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Language Support
&lt;/h2&gt;

&lt;p&gt;CocoIndex supports all major languages via Tree-sitter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python, JavaScript, TypeScript&lt;/li&gt;
&lt;li&gt;Rust, Go, C, C++, Java&lt;/li&gt;
&lt;li&gt;Ruby, PHP, Swift, Kotlin&lt;/li&gt;
&lt;li&gt;And 30+ more&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://cocoindex.io/docs/ops/functions#supported-languages" rel="noopener noreferrer"&gt;Full language list here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualize with CocoInsight
&lt;/h2&gt;

&lt;p&gt;Want to debug your indexing flow visually?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cocoindex server &lt;span class="nt"&gt;-ci&lt;/span&gt; main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This spins up CocoInsight at &lt;code&gt;https://cocoindex.io/cocoinsight&lt;/code&gt; where you can inspect your data flow step-by-step.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why You Should Care
&lt;/h2&gt;

&lt;p&gt;AI coding tools are only as good as the context you give them. If you're building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI agents that need code awareness&lt;/li&gt;
&lt;li&gt;Semantic code search engines&lt;/li&gt;
&lt;li&gt;Automated documentation generators&lt;/li&gt;
&lt;li&gt;Code review automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...you need a proper codebase index. &lt;/p&gt;

&lt;p&gt;CocoIndex makes it stupidly simple.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;⭐ &lt;strong&gt;Star the repo&lt;/strong&gt;: &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;github.com/cocoindex-io/cocoindex&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📖 &lt;strong&gt;Read the docs&lt;/strong&gt;: &lt;a href="https://cocoindex.io/docs" rel="noopener noreferrer"&gt;cocoindex.io/docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🎥 &lt;strong&gt;Watch the tutorial&lt;/strong&gt;: &lt;a href="https://youtu.be/G3WstvhHO24" rel="noopener noreferrer"&gt;YouTube guide&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;💬 &lt;strong&gt;Join Discord&lt;/strong&gt;: &lt;a href="https://discord.com/invite/zpA9S2DR7s" rel="noopener noreferrer"&gt;discord.com/invite/zpA9S2DR7s&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What are you building with AI code tools?&lt;/strong&gt; Drop a comment below—I'd love to hear your use case!&lt;/p&gt;

&lt;p&gt;If you found this useful, give CocoIndex a star on GitHub. It's open source and built by developers who actually understand the pain of managing code context for AI. 🚀&lt;/p&gt;

</description>
      <category>rust</category>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>From Repo to Movement: Building an Open Source Project and Its Community</title>
      <dc:creator>Linghua Jin</dc:creator>
      <pubDate>Sat, 13 Dec 2025 21:59:26 +0000</pubDate>
      <link>https://forem.com/badmonster0/from-repo-to-movement-building-an-open-source-project-and-its-community-215i</link>
      <guid>https://forem.com/badmonster0/from-repo-to-movement-building-an-open-source-project-and-its-community-215i</guid>
      <description>&lt;p&gt;Publishing a public GitHub repository is easy; turning it into a healthy open source project with an active community is hard work and very intentional. The projects that endure treat both their code and their community as core product surfaces, with the README acting as the front door to everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with a sharp problem and vision
&lt;/h2&gt;

&lt;p&gt;Every great open source project starts with a real, painful problem and a specific audience, not just an experiment pushed to GitHub. Define who you are building for, what they are trying to do, and why existing tools are insufficient.&lt;/p&gt;

&lt;p&gt;Write this down as a concise vision: what success looks like, what is in scope, and what is explicitly out of scope. This vision will later anchor your roadmap, help you say "no" to distracting feature requests, and give contributors something clear to align with.&lt;/p&gt;

&lt;h2&gt;
  
  
  Set the foundations: license, structure, and tooling
&lt;/h2&gt;

&lt;p&gt;Before you ask anyone to contribute, make the project safe and predictable to use. Choose an explicit open source license that matches your goals, add it to the repository, and structure your code in a way that is easy to navigate.&lt;/p&gt;

&lt;p&gt;Set up basic automation and hygiene early: tests, linters, continuous integration, and clear directory conventions. These guardrails reduce friction for both you and future contributors, and signal that the project is maintained professionally.&lt;/p&gt;

&lt;h2&gt;
  
  
  The README: your project's front door
&lt;/h2&gt;

&lt;p&gt;A README is the welcome mat and elevator pitch of your project: it explains what the project is, why it exists, and how to get value from it in the first five minutes. Visitors skim the README to decide whether to try the project, whether it seems maintained, and whether it looks friendly to contributions.&lt;/p&gt;

&lt;p&gt;A great README typically answers, in order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What problem does this solve, and for whom?&lt;/li&gt;
&lt;li&gt;How do I install it and run a minimal example?&lt;/li&gt;
&lt;li&gt;How does it fit into my stack or workflow?&lt;/li&gt;
&lt;li&gt;Where do I go for docs, questions, and contributions?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When this information is clear, users reach "first success" much faster, which directly increases adoption and the pool of potential contributors. Well-structured READMEs are strongly associated with higher contribution rates because they reduce ambiguity, confusion, and support noise for new people.&lt;/p&gt;

&lt;h2&gt;
  
  
  How a great README supercharges your community
&lt;/h2&gt;

&lt;p&gt;The README does more than explain the code; it actively shapes your community's growth curve. First, it acts as always-on marketing and documentation: search engines index it, blog posts link to it, and people share screenshots of it in chats and social feeds.&lt;/p&gt;

&lt;p&gt;Second, it acts as a community filter and funnel. When you include links to "good first issues," contributor guides, chat channels, and governance docs, the README quietly converts curious users into engaged participants. Clear expectations about scope, stability, and contribution norms reduce low-quality issues and misaligned requests, which keeps the environment healthier for everyone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design for contributor experience from day one
&lt;/h2&gt;

&lt;p&gt;Once the foundations and README are in place, design the project around contributors, not just maintainers. Add a CONTRIBUTING file that explains how to set up the environment, run tests, and submit issues or pull requests, and pair it with issue and PR templates.&lt;/p&gt;

&lt;p&gt;Label beginner-friendly tasks, keep a small set of "good first issues" fresh, and ensure that at least one maintainer watches and responds quickly. Early, fast feedback often matters more than the actual decision, because it tells new contributors that their time is respected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Governance and roles: make power explicit
&lt;/h2&gt;

&lt;p&gt;As your project grows, unclear decision-making is one of the fastest ways to frustrate contributors. Choose a governance model—strong maintainer, core team, or more formal steering committee—and write down how decisions are made, who has which permissions, and how new people can grow into those roles.&lt;/p&gt;

&lt;p&gt;Document this in a lightweight governance or project charter file and reference it from the README so it is easy to find. Pair governance with a Code of Conduct that defines expected behavior, reporting channels, and enforcement to keep the space safe and inclusive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Treat communication as part of the product
&lt;/h2&gt;

&lt;p&gt;Healthy communities are built on predictable, transparent communication. Use your README and docs to advertise where discussions happen (GitHub Discussions, Discord, mailing list), and keep most technical decisions and design conversations in public channels by default.&lt;/p&gt;

&lt;p&gt;Maintain a cadence of release notes, changelogs, and occasional blog posts or newsletters that explain not just what changed, but why. Publicly thanking contributors in these updates reinforces good behavior and encourages more people to step up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Balance sustainability and growth
&lt;/h2&gt;

&lt;p&gt;Maintaining an open source project is a marathon, not a sprint, and burnout is real. Use automation—bots for formatting, dependency updates, stale issue triage—to free maintainers from repetitive tasks, and be comfortable sunsetting features or saying "no" when requests drift from your stated vision.&lt;/p&gt;

&lt;p&gt;Over time, consider sustainability levers such as sponsorships, grants, or institutional backing if your project becomes critical infrastructure. Your goal is to build a project where code, community, and documentation—anchored by a great README—reinforce each other, so no single maintainer has to carry the entire weight alone.&lt;/p&gt;




&lt;p&gt;When building &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;CocoIndex&lt;/a&gt;, the goal was to solve a very specific pain: keeping AI indexes fresh without drowning in custom data pipelines, so the core engine ships in Rust with a Python-first developer experience and opinionated building blocks for incremental data processing. From day one, the README doubled as a narrative and a playbook—showing real pipelines, quickstarts, and examples—so that people could go from "what is this?" to "I have it running on my data" as fast as possible, which turned early adopters into collaborators. To keep the Discord community warm, the focus has been on fast, friendly responses, sharing work-in-progress features, tagging good first issues from GitHub, and using the server as an open design room rather than a support ticket queue, so that contributors feel like co-builders of the framework, not just users asking for help.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>community</category>
      <category>github</category>
      <category>rust</category>
    </item>
    <item>
      <title>This Week in Rust Trending: Storage, AI Agents, and Real‑World Infra</title>
      <dc:creator>Linghua Jin</dc:creator>
      <pubDate>Sat, 13 Dec 2025 02:13:37 +0000</pubDate>
      <link>https://forem.com/badmonster0/this-week-in-rust-trending-storage-ai-agents-and-real-world-infra-2af0</link>
      <guid>https://forem.com/badmonster0/this-week-in-rust-trending-storage-ai-agents-and-real-world-infra-2af0</guid>
      <description>&lt;p&gt;This week's Rust trending list isn't about toy crates or hobby side‑projects. It's about storage engines, AI agents that actually ship code, blockchains, local‑first AI, and serious infra you can run in production (or soon enough).&lt;/p&gt;

&lt;p&gt;Here's a quick tour of the top 10 trending Rust repos, what they do, and why developers are starring them so hard right now.&lt;/p&gt;




&lt;h2&gt;
  
  
  RustFS: Rust Takes on S3 Object Storage
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/rustfs/rustfs" rel="noopener noreferrer"&gt;RustFS&lt;/a&gt; is a high‑performance, S3‑compatible distributed object storage system written in Rust. It targets data lakes, AI pipelines, and big‑data workloads with support for migration and coexistence with other S3 platforms like MinIO and Ceph. RustFS leans heavily on Rust's concurrency and memory safety story, aiming for very small 4KB object latencies and advertises up to 2.3x performance vs MinIO in its own benchmarks.&lt;/p&gt;

&lt;p&gt;Why it's trending:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It hits a real, expensive problem: on‑prem and cloud‑agnostic S3‑compatible storage with an Apache‑2.0 license.&lt;/li&gt;
&lt;li&gt;The combination of "S3‑compatible", "Rust‑based", and "faster than MinIO" is catnip for infra engineers.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Goose: AI Agent That Actually Touches Your Codebase
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/block/goose" rel="noopener noreferrer"&gt;Goose&lt;/a&gt; is an open‑source, extensible AI agent that goes far beyond autocomplete. It installs dependencies, edits your files, runs tests, and integrates with multiple LLMs while evolving toward a Rust core and a rich tool/extension ecosystem. Goose is designed to be the glue between your repo, your terminal, and your models.&lt;/p&gt;

&lt;p&gt;Why it's trending:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It embodies the "agents that do, not just chat" movement: people want assistants that can refactor, fix, and test real projects.&lt;/li&gt;
&lt;li&gt;A clear roadmap, active community, and focus on reproducible, scriptable workflows make it appealing for serious teams, not just tinkerers.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Polkadot SDK: Full‑Stack Blockchain Toolkit in Rust
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/paritytech/polkadot-sdk" rel="noopener noreferrer"&gt;Polkadot SDK&lt;/a&gt; is the official Rust toolkit for building on the Polkadot ecosystem. It bundles networking, consensus, Substrate primitives, and tooling so teams can build full blockchains and parachains. The project ships opinionated versioned releases and tooling like a CLI to manage SDK versions.&lt;/p&gt;

&lt;p&gt;Why it's trending:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It's one of the most mature "build your own chain in Rust" stacks, backing production networks and serious research.&lt;/li&gt;
&lt;li&gt;With a consolidated SDK instead of a dozen scattered crates, teams get a more approachable entry point into Polkadot development.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Turso: SQLite‑Compatible, Rust‑Powered Database
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/tursodatabase/turso" rel="noopener noreferrer"&gt;Turso&lt;/a&gt; is an in‑process SQL database written in Rust that is intentionally compatible with SQLite at the file, C API, and SQL levels. On top of that, it layers features like async I/O, change data capture, multi‑language bindings, and vector operations to support modern, edge‑heavy and AI workloads.&lt;/p&gt;

&lt;p&gt;Why it's trending:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developers get the familiarity of SQLite with a Rust engine designed for edge, serverless, or embedded scenarios.&lt;/li&gt;
&lt;li&gt;Built‑in support for CDC and vectors lines up perfectly with the "AI plus event‑driven plus edge" stack a lot of teams are chasing right now.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  CocoIndex: Data Transformation for AI Context
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;CocoIndex&lt;/a&gt; is a data transformation framework focused on AI and context‑heavy workloads. It uses a Rust core to deliver ultra‑fast, incremental processing between sources and targets like object storage, databases, and vector stores. The design emphasizes reproducible, composable pipelines for building and maintaining AI‑ready indexes and knowledge graphs.&lt;/p&gt;

&lt;p&gt;Why it's trending:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It speaks directly to teams building RAG, search, and analytics systems who need low‑latency indexing instead of generic batch ETL.&lt;/li&gt;
&lt;li&gt;Its "building blocks" model for sources, transforms, and sinks makes it feel like Lego for AI data infrastructure.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Rustlings: The Perennial Rust On‑Ramp
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/rust-lang/rustlings" rel="noopener noreferrer"&gt;Rustlings&lt;/a&gt; is the canonical "little exercises" repo that teaches Rust by making you fix compiler errors and complete tiny challenges. It ships with an installer, rust‑analyzer integration, quizzes, and a curated curriculum that tracks the language's evolution.&lt;/p&gt;

&lt;p&gt;Why it's trending (again):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every new wave of devs trying Rust ends up here; it's often the first repo people star when starting their Rust journey.&lt;/li&gt;
&lt;li&gt;The exercises mirror real compiler messages and tooling, which makes the learning path feel tightly aligned with day‑to‑day Rust work.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Magisk: Android Modding Meets Modern Systems Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/topjohnwu/Magisk" rel="noopener noreferrer"&gt;Magisk&lt;/a&gt; is the legendary Android "magic mask": a systemless rooting and module platform that lets power users and developers customize their devices without touching the system partition. Around this ecosystem, Rust is increasingly used for performance‑ and security‑sensitive components, reflecting a broader trend in Android tooling.&lt;/p&gt;

&lt;p&gt;Why it's trending:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Magisk is widely used in the real world; when its ecosystem touches Rust, it showcases Rust in practical, high‑impact scenarios.&lt;/li&gt;
&lt;li&gt;It sits at the intersection of low‑level systems work, security, and hacker culture—exactly where Rust enthusiasts love to play.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Yew: Build Web Apps in Rust + WASM
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/yewstack/yew" rel="noopener noreferrer"&gt;Yew&lt;/a&gt; is a Rust/WASM framework for building client‑side web apps with a component‑based model, inspired by the ergonomics of modern JavaScript frameworks. It gives Rust developers a way to write frontends without switching to TypeScript, while still targeting the browser with WebAssembly.&lt;/p&gt;

&lt;p&gt;Why it's trending:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yew is the go‑to answer when someone asks "Can I build a SPA in Rust?" and continues to polish its DX and performance story.&lt;/li&gt;
&lt;li&gt;It offers a unified language stack for teams heavily invested in Rust, especially in full‑stack or embedded‑plus‑web contexts.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Vibe Kanban: A Kanban Board for AI Coding Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/BloopAI/vibe-kanban" rel="noopener noreferrer"&gt;Vibe Kanban&lt;/a&gt; is a Kanban board designed specifically to orchestrate AI coding agents. Instead of traditional tickets, you manage tasks that are executed by agents, with support for running them in parallel or sequence, spinning up dev servers, and managing configuration via MCP‑style tooling.&lt;/p&gt;

&lt;p&gt;Why it's trending:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It captures a new workflow: engineers supervising fleets of agents rather than writing every line of code themselves.&lt;/li&gt;
&lt;li&gt;The "just run it with npx and connect GitHub" story makes it easy to try on a real repo in minutes.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Meeting Minutes (Meetily): Local‑First AI Meeting Assistant
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Zackriya-Solutions/meeting-minutes" rel="noopener noreferrer"&gt;meeting-minutes&lt;/a&gt; (Meetily) is an open‑source AI meeting assistant that runs locally, giving you recording, live transcription (Parakeet/Whisper), speaker diarization, and summarization via Ollama or cloud LLMs. It's built in Rust and targets macOS, Windows, and Linux with a strong privacy‑first, self‑hosted posture.&lt;/p&gt;

&lt;p&gt;Why it's trending:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It solves a real, universal problem—meeting notes—without sending your audio to someone else's cloud by default.&lt;/li&gt;
&lt;li&gt;Combining Rust for performance with GPU‑accelerated speech models and local LLMs hits the sweet spot of "AI you actually own."&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Fresh: A New TUI Editor Written in Rust
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/sinelaw/fresh" rel="noopener noreferrer"&gt;fresh&lt;/a&gt; is a terminal text editor that aims to be easy to use, powerful, and fast, all written in Rust. It targets developers who want a modern alternative to Vim/Emacs/Helix but with a gentler learning curve and sensible defaults.&lt;/p&gt;

&lt;p&gt;Why it's trending:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"New Rust TUI editor" is practically a genre, and performance‑obsessed devs love trying editors that promise speed and simplicity.&lt;/li&gt;
&lt;li&gt;It offers a familiar terminal workflow with a more approachable UX than the classic modal or Lisp‑heavy editors.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Big Picture: What This Week Says About Rust
&lt;/h2&gt;

&lt;p&gt;Across these trending projects, a few clear themes emerge:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Theme&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AI‑native tooling&lt;/td&gt;
&lt;td&gt;Goose, CocoIndex, Vibe Kanban, Meeting Minutes&lt;/td&gt;
&lt;td&gt;Rust is becoming a backbone for serious, stateful AI infra and agents.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infra you can run today&lt;/td&gt;
&lt;td&gt;RustFS, Turso, Magisk ecosystem&lt;/td&gt;
&lt;td&gt;These are real workloads—storage, databases, Android systems—built in Rust.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer experience focus&lt;/td&gt;
&lt;td&gt;Rustlings, Yew, Fresh&lt;/td&gt;
&lt;td&gt;Better learning paths and tools keep lowering the barrier into Rust.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Rust is no longer just "the safe systems language with cool borrow‑checker memes." This week's trending list shows it powering object storage, blockchains, databases, AI agents, and local‑first productivity tools—all projects that developers can run today, not just read about.&lt;/p&gt;

&lt;p&gt;If you're Rust‑curious, this list is a great set of repos to star, clone, and learn from next.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>programming</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
