Forem: Moshe Simantov

Local SQLite Beats Cloud Docs for AI Coding. Our v1 Ships Today.

Moshe Simantov — Thu, 30 Apr 2026 02:55:39 +0000

A year ago, the consensus was that AI coding assistants needed a cloud documentation service to stay current. Context7 and Deepcon were the obvious choices. We disagreed, and we shipped a bet: a local SQLite file plus MCP is enough.

Today that bet ships as v1.

@neuledge/context v1.0.0 is the stable release of an open-source, local-first documentation server for AI coding assistants. No API keys. No rate limits. No network calls at query time. One install, one MCP entry, and your AI sees real, version-pinned docs in under 10ms.

This post is the short version of what we learned getting here, what's in v1, and where we're headed.

The bet, in one paragraph

Library documentation isn't streaming data. It changes per release, not per minute. SQLite handles that workload at memory speed. MCP makes the result addressable to any AI client — Claude Code, Cursor, Copilot, Windsurf, anything else that speaks the protocol. Wrap that into one CLI and you don't need a SaaS for documentation. You need a binary and a registry.

That's the whole product. Everything in v1 is a consequence of that decision.

How we got here — the 0.x story

Each minor release in 0.x answered a specific "but what about…" question. v1 is the moment that list got short enough to call the API stable.

0.1 — context add <git-repo>. The original premise: clone a docs repo, parse the Markdown, build a .db file, expose it over MCP. It worked, but every developer built every package from source the first time they used it. (Getting started walkthrough →)

0.3 — A community registry. We built api.context.neuledge.com and pre-built ~150 packages across npm, pip, and maven. context install npm/next 15 pulls a verified, current .db instead of building one. (Why we built a registry →)

0.3 — Multi-format parsing. Markdown wasn't enough. Python ecosystems live in reStructuredText. Java lives in AsciiDoc. The same release added both, which is how Django, Flask, FastAPI, and Spring Boot showed up in the registry. (Beyond Markdown →)

0.4 — HTTP server mode. context serve --http turns one machine into a team-shared MCP server. One install, every developer's editor connects to it.

0.5–0.6 — HTML parsing and Windows compatibility. Turndown for HTML pages, sql.js as a WebAssembly fallback when better-sqlite3 won't compile. Less glamorous than features, more important for adoption.

0.7–0.8 — llms.txt with link following. context add https://react-aria.adobe.com works on any site that publishes an llms.txt. Most tools stop at the index file. We follow the links and store the actual docs.

0.9 — Anything with a URL. When llms.txt isn't there, fall back to fetching the page directly with browser-like headers and HTML cleanup via defuddle. Plus context auth add lets you index docs behind a login — your own Substack, your own Medium, your own paid newsletters — without sending credentials anywhere except the source site.

The pattern: every minor version closed off a "you can't use this for X" objection. v1 says we're done closing the obvious ones.

Why this matters vs the cloud alternatives

The headline difference between Context and the cloud documentation services isn't speed (though sub-10ms is hard to beat) — it's the absence of a vendor in the loop. With a local .db file:

No rate limits. Context7's free tier dropped to 1,000 requests/month earlier this year. That's a couple of long debugging sessions. v1 has no concept of a request quota.
No outages on someone else's status page. Your AI's documentation lookup works on a plane, in a SCIF, on a flaky hotel WiFi.
No privacy tax. Your queries don't leave your machine. The model sees the docs; nobody else sees what you asked.
No paywall on your own subscriptions. With context auth add, the docs you already pay for (newsletters, paid blogs, gated developer portals) become accessible to your AI without re-routing credentials through a third party.

The cloud services optimize for "we keep the docs current so you don't have to." v1 optimizes for "the docs are a file, and files are a solved problem."

What v1 actually means

v1.0.0 is not a feature dump. It's a stability commitment on top of everything 0.x shipped:

Semver from here. The CLI surface, the MCP tool definitions, and the .db schema are now covered by semantic versioning. Breaking changes ship as v2.
Three sources, one tool. Git repositories, the community registry, and any URL with or without llms.txt. If documentation exists somewhere on the public internet, context add ingests it.
Cross-ecosystem coverage. JavaScript, Python, Java, plus any HTML/Markdown/RST/AsciiDoc source. Not the universe, but enough of it that "my stack isn't supported" is a rare answer.
A team story. HTTP server mode, session management, and structured logging mean one developer can run a Context server for the whole team instead of each engineer maintaining their own.

If you've been waiting for v1 before evaluating, this is your signal. The rough edges that justified "we'll wait for stable" have been sanded down.

What it looks like

Two commands and a config block:

# Install
npm install -g @neuledge/context

# Pull a documentation package from the registry
context install npm/next 15

Wire it into your MCP client (Claude Code shown — see /integrations for Cursor, Copilot, Windsurf, and Claude Desktop):

{
  "mcpServers": {
    "context": {
      "command": "context",
      "args": ["serve"]
    }
  }
}

That's it. Your AI coding assistant can now answer "how do I do X in Next.js 15" from a local SQLite file in under 10 milliseconds, with zero network calls, zero rate limits, and the actual API surface of the version you pinned — not whatever the model trained on a year ago.

For libraries that aren't in the registry yet:

context add https://docs.example.com   # auto-discovers llms.txt or falls back to page fetch
context add github.com/owner/docs-repo  # straight from git

What's next

A few directions we're investing in, deliberately vague on dates:

More registry coverage. Especially Python, Java, and Go ecosystems where the registry is thinner than npm.
Better discovery. Right now you have to know the package name. We want the AI to be able to answer "what docs do you have for $thing" before it answers the question itself.
Tighter editor integrations. The MCP transport layer is solved; the per-editor UX still has rough edges.

If you have an opinion about which of those matters most, the issue tracker is github.com/neuledge/context/issues.

Try v1

npm install -g @neuledge/context

Read the docs, browse the integrations, or star the repo if v1 saves you a Context7 bill.

A year ago we bet that local SQLite + MCP was enough for documentation. v1 is what "enough" looks like.

New here? Start with why local-first documentation matters for AI coding, or jump straight to the step-by-step setup guide.

MCP vs Function Calling: A Practical Decision Framework

Moshe Simantov — Mon, 27 Apr 2026 16:41:39 +0000

You're building an AI agent and you need it to call tools. You've seen two ways to do it: function calling (OpenAI tools, Claude tool_use, Gemini function declarations) and MCP (Model Context Protocol). They both let your agent invoke external capabilities. So which one do you use?

The short answer: MCP vs function calling isn't an either/or decision. They operate at different layers of your architecture and solve different problems. But the distinction isn't obvious, because from the model's perspective, both look like "here's a tool, call it when relevant."

The real question isn't which is better. It's which fits your use case — and in most production setups, the answer is both.

Function calling: tight integration, zero overhead

Function calling is the native tool-use mechanism built into model APIs. You define a function schema, send it with your prompt, and the model returns a structured call when it decides the function is relevant. Your application executes it and feeds the result back.

What makes it good:

Zero infrastructure. No server process, no protocol layer, no configuration. Define a schema, wire up a handler, done.
Tight coupling with model reasoning. The model sees your function schemas as part of its instruction set. It can reason about when to call them, chain them together, and interpret results — all within a single request/response loop.
Schema validation by the provider. OpenAI, Anthropic, and Google all validate function schemas and enforce structured output. You get type safety for free.
Perfect for application-specific logic. Creating a user, processing a payment, querying your database — actions that only your app needs and that are tightly bound to your business domain.

Where it falls short:

Locked to one application. Your carefully defined search_products function lives inside your codebase. No other tool, editor, or agent can use it without duplicating the implementation.
Locked to one provider. OpenAI function schemas and Anthropic tool_use schemas are similar but not identical. Switch providers and you're rewriting tool definitions.
No resource or context sharing. Function calling is request/response only. There's no standard way for the tool to proactively provide context, expose browsable resources, or share prompt templates.

Function calling is the right default for anything that's specific to your application. If only your agent needs to call it, function calling is simpler, faster, and sufficient.

MCP: standardized, reusable, cross-client

Model Context Protocol takes a different approach. Instead of defining tools inline with your model calls, you run a separate MCP server that exposes tools, resources, and prompts over a standardized protocol. Any MCP-compatible client — Claude Code, Cursor, Continue, Windsurf, your custom agent — can connect and use them.

What makes it good:

Write once, use everywhere. An MCP server works with any client that speaks the protocol. Build a documentation server and it works in Claude Code, Cursor, and your CI pipeline without changes.
Beyond just tools. MCP servers can expose resources (browsable data the client can pull into context) and prompts (reusable prompt templates). Function calling only does request/response tool calls.
Ecosystem scale. The MCP ecosystem has grown to 5,800+ servers and 300+ clients. Common integrations — databases, documentation, APIs, file systems — already exist as packages you can install.
Cross-vendor by design. MCP was donated to the Linux Foundation (AI & Data). It's not owned by any model provider, which means your investment in MCP tooling isn't tied to a single vendor's roadmap.
Best for reusable integrations. If multiple agents, editors, or tools need the same capability — searching docs, querying a database, accessing an API — MCP eliminates duplication.

Where it falls short:

Infrastructure overhead. You're running a server process. That means configuration, lifecycle management, and another thing that can fail.
Protocol overhead. Communication happens over stdio or HTTP with JSON-RPC. For simple, high-frequency tool calls inside a tight loop, the extra layer adds latency you might not want.
Security surface area. An MCP server is a process with access to resources. You need to think about what it can reach, who can connect to it, and how to scope permissions — problems that don't exist when your tools are just functions in your codebase.
Overkill for single-app tools. If only your application needs a tool, wrapping it in an MCP server adds complexity for no reuse benefit.

ThoughtWorks has flagged a common anti-pattern here: naive API-to-MCP conversion — taking every internal API endpoint and wrapping it in an MCP server "because MCP." This adds infrastructure overhead without the reuse benefit that justifies it. MCP makes sense when the tool genuinely serves multiple clients.

The decision framework

Here's how to decide which pattern fits each tool in your agent:

Factor	Function Calling	MCP
Scope	One application	Multiple apps/clients
Coupling	Tight — lives in your codebase	Loose — separate server
Setup	Zero — built into model APIs	Server process + config
Reusability	Low — app-specific	High — cross-client
Ecosystem	Vendor-specific schemas	Cross-vendor standard
Capabilities	Tools only	Tools + resources + prompts
Overhead	Minimal	Server process + protocol

Rules of thumb:

If only your app needs it — function calling. Your create_order handler doesn't need to be a server.
If multiple tools or editors need it — MCP. Don't reimplement the same integration in every client.
If it's a common integration (documentation, database, search, file system) — probably MCP. Someone has likely already built the server.
If it's custom business logic (domain-specific actions, internal workflows) — probably function calling. It's yours and it's not going to be reused.
If you need resources or prompt templates — MCP. Function calling doesn't support these concepts.

Using both together

Most production agents use both patterns. Here's what that looks like:

MCP servers for shared, reusable integrations:

# Documentation — accessible to every AI tool on your machine
npx @neuledge/context add react@19 typescript@5.8
npx @neuledge/context serve

# Database access — reusable across agents and editors
npx @modelcontextprotocol/server-postgres

MCP is the right fit for documentation because every coding assistant on your machine benefits from the same source. You index your docs once with @neuledge/context and Claude Code, Cursor, and your custom agents all get version-pinned, sub-10ms access. That's the reuse pattern MCP was designed for.

Function calling for application-specific actions:

const tools = [
  {
    name: "create_support_ticket",
    description: "Create a support ticket in the internal system",
    parameters: {
      type: "object",
      properties: {
        title: { type: "string" },
        priority: { enum: ["low", "medium", "high", "critical"] },
        customer_id: { type: "string" },
      },
      required: ["title", "priority", "customer_id"],
    },
  },
  {
    name: "check_inventory",
    description: "Check real-time inventory for a product SKU",
    parameters: {
      type: "object",
      properties: {
        sku: { type: "string" },
        warehouse: { type: "string" },
      },
      required: ["sku"],
    },
  },
];

These are tightly coupled to your business domain. No other tool needs them. Function calling keeps them simple.

The combined architecture:

Your Agent
├── MCP Connections (shared integrations)
│   ├── @neuledge/context → versioned docs
│   ├── postgres-server → database queries
│   └── github-server → repo access
│
└── Function Calling (app-specific tools)
    ├── create_support_ticket()
    ├── check_inventory()
    └── process_refund()

The MCP layer handles everything that's reusable across your toolchain. Function calling handles everything that's specific to this agent's job. Clean separation, no duplication.

Start with the simpler thing

If you're just getting started with AI agents, start with function calling. It's built in, requires no infrastructure, and handles the most common use case — giving your agent access to your application's capabilities.

When you start hitting the reuse problem — the same integration duplicated across tools, the same docs needed by multiple agents, a tool that should work in both your app and your IDE — that's when MCP earns its overhead.

For documentation specifically, the reuse case is immediate. Your docs are relevant to every AI tool touching your codebase. Try @neuledge/context as an MCP documentation server — set up takes two commands, and every MCP client on your machine gets access.

For everything else, use the decision framework above. The goal isn't to pick a side. It's to use each pattern where it fits and avoid the anti-pattern of wrapping every function in a server it doesn't need.

For the broader picture on choosing the right tool integration pattern for reliable agents, read our grounding architecture guide. To see what's available in the MCP ecosystem, browse the community registry of pre-built packages.

llms.txt Is Just a Table of Contents. Most AI Tools Stop There.

Moshe Simantov — Thu, 16 Apr 2026 21:34:24 +0000

If you've spent any time in the AI tooling space recently, you've probably seen llms.txt popping up everywhere. React Aria, Anthropic, Svelte, Next.js, MUI — a growing list of projects now ship an llms.txt file at their site root. The idea, proposed by Jeremy Howard and inspired by robots.txt and sitemap.xml, is simple: give AI tools a structured entry point to your documentation.

But there's a catch that most tools miss. An llms.txt file is a discovery index — a table of contents with section headers and links to the actual documentation pages. It is not the documentation itself. And most tools that claim llms.txt support stop at reading the index.

What llms.txt actually is (and isn't)

An llms.txt file is a single Markdown file at a site's root that lists documentation sections and links to detail pages. Think of it as a map — it tells you what exists and where to find it.

The structure is straightforward: headings group topics, and each topic links to one or more documentation pages. Some sites also publish an llms-full.txt that bundles everything inline — but most don't, because their documentation is too large to fit in a single file.

The important distinction: llms.txt is a discovery mechanism, not a documentation format. It points to docs. It doesn't contain them.

This matters because of how tools use it.

The "half the answer" problem

Here's what happens with most llms.txt-aware tools today: they fetch the llms.txt file, feed it to the model, and call it done. Your AI assistant gets a list of section titles and one-line descriptions — a menu, not a meal.

Take a real example. A popular framework's llms.txt is about 84 KB with 8 major sections. That sounds like a lot, but it's almost entirely links and brief descriptions. The actual documentation — the API signatures, code examples, migration guides, edge cases — lives behind those links. Without following them, your AI assistant is working with an outline.

This creates a frustrating failure mode. The model knows the API exists (it saw the link title), but it doesn't have the details. So it does what LLMs do — it fills in the gaps from training data. You get answers that sound right but reference deprecated patterns, wrong parameter names, or APIs from the wrong version.

Cloud-based tools like GitMCP do read llms.txt, but they still bounce queries through a remote service — adding latency, rate limits, and routing your codebase questions through someone else's infrastructure. The local-first approach avoids all of that.

The missing piece is simple: follow the links, fetch the real docs, and store them locally.

`context add <website>` — the new path

@neuledge/context now supports adding documentation directly from any website that publishes an llms.txt file. No git repo needed, no manual .db file construction — just point it at a URL.

Three usage patterns:

Bare domain — auto-discovers llms-full.txt, then falls back to llms.txt:

  context add https://react-aria.adobe.com

Direct file URL — skips discovery, uses the specified file:

  context add https://mui.com/material-ui/llms.txt

Custom package name — overrides the default hostname-based name:

  context add https://react-aria.adobe.com --name react-aria

Under the hood, any HTTPS URL that isn't a .db file or a git host is treated as a website source. Context tries llms-full.txt first (the complete bundle), then llms.txt (the index). If it finds the full version, you get everything in one fetch. If it finds the index, it does something most tools skip — it follows every link.

Following the links (why the index isn't enough)

When Context detects an llms.txt index (as opposed to llms-full.txt), it doesn't stop at the table of contents. It parses the Markdown links grouped by section header, then fetches each linked document concurrently.

The defaults are practical:

Concurrency: 5 parallel fetches
Timeout: 30 seconds per link
Max links: 500 (covers even massive documentation sites)
Same-origin only: links to external sites are skipped — you asked for React Aria docs, not random blog posts it happens to link to
Per-link failure tolerance: one 404 doesn't kill the whole build

The fetched documents get consolidated with the index and passed through the same package builder that handles git repos — deduplication, semantic chunking, FTS5 indexing into a portable SQLite .db file.

Before (index only): 8 sections, 84 KB — a table of contents.
After (links followed): hundreds of pages of actual documentation, deduped and indexed into a searchable local database.

Same-origin filtering matters for both signal and security. When you run context add https://react-aria.adobe.com, you want React Aria's documentation, not every external resource their docs happen to reference.

A real example, end to end

Let's walk through adding React Aria's documentation. Their site publishes an llms.txt at the root.

$ context add https://react-aria.adobe.com --name react-aria
Fetching https://react-aria.adobe.com/llms-full.txt... not found
Fetching https://react-aria.adobe.com/llms.txt... found
Detected llms.txt index with 147 linked documents
Fetching linked documents...
Fetched 139/147 documents (8 failed)
Building package "react-aria"...
Package built: .context/react-aria.db (139 documents)

Eight links returned 404s — probably outdated references in the llms.txt. That's fine. The 139 that succeeded contain the actual component APIs, hooks documentation, styling guides, and accessibility patterns.

Now wire it into your MCP client. If you're using Claude Code:

claude mcp add context -- npx @neuledge/context mcp

For Cursor or VS Code, add to your settings:

{
  "mcpServers": {
    "context": {
      "command": "npx",
      "args": ["@neuledge/context", "mcp"]
    }
  }
}

Now ask your AI assistant something specific — not "what is React Aria?" but something that requires the real docs: "How do I implement a custom calendar with React Aria's useCalendar hook, including locale support and disabled date ranges?"

Without the package, your assistant would cobble together an answer from training data — probably mixing up hook names or missing the createCalendar dependency. With the indexed docs, it searches the actual React Aria reference for useCalendar, finds the parameters, the locale configuration, and the isDateUnavailable callback. Grounded answers instead of educated guesses.

You can inspect what got indexed with context browse react-aria to see the full list of documents in the package.

Where this fits in the bigger picture

There are now three ways to get documentation into @neuledge/context:

Community registry — context install npm/react — 116+ pre-built packages ready to download.
Any llms.txt site — context add https://... — the capability covered in this article.
Any git repo with docs — context add https://github.com/... — the original path, with multi-format support for Markdown, reStructuredText, and AsciiDoc.

The llms.txt path closes an important gap. The registry covers popular libraries, but it can't cover everything. If a library publishes an llms.txt — and the list keeps growing — you can grab its docs even if nobody has added it to the registry yet.

For library authors, this creates a clear path: publish an llms.txt, and your users can instantly index your documentation into their AI tooling. No PR to any registry required. Just ship the file and let the tools follow the links.

Try it

Find a library you use that ships an llms.txt. Run context add <url>. Then ask your AI assistant the hardest question about that library — the one it usually gets wrong.

npx @neuledge/context add https://docs.anthropic.com

Product page — features, architecture, and how Context works
Documentation — quick start and editor configuration
Community registry — 116+ pre-built packages
llmstxt.org — the llms.txt specification

Your AI Coding Assistant Isn't Stupid — It's Starving for Context

Moshe Simantov — Tue, 07 Apr 2026 03:17:52 +0000

Every few months, a new model drops and developers upgrade their AI coding assistant expecting the hallucinations to finally stop. GPT-4 to GPT-5 to GPT-5.4. Claude 3.5 to 4 to Opus 4.6. Gemini 2 to 3 to 3.1. The benchmarks go up. The confident-but-wrong suggestions keep coming.

At some point you have to ask: if the model keeps getting smarter and the output keeps being wrong in the same ways, maybe the model was never the problem.

It isn't. The bottleneck in AI coding accuracy is context, not capability — and upgrading the model is the least effective lever you have.

The model upgrade treadmill

Here's the loop most teams are stuck in. The assistant suggests a deprecated API. You blame the model. A new model ships. You upgrade. The assistant suggests a different deprecated API. You blame the model again.

Look at what actually causes these failures in practice:

Wrong API signatures. Your assistant calls fetch(url, { json: true }) because it learned a pattern from 2021 Node.js libraries. The current fetch doesn't take that option. The model can reason fine — it just learned an obsolete fact and has no way to know it's obsolete.
Deprecated method suggestions. It reaches for componentWillMount or useEffect patterns from React 16. The model isn't broken. The training data is just a blur of every React version ever written.
Version-mismatched code. You're on Next.js 15, the assistant writes Next.js 13 patterns because that's where most of its training data lives. Every major version is blended together with no version labels.

None of these are reasoning failures. A human given the same inputs would make the same mistakes. These are context failures — the model is answering the question it was asked with the information it was given, and that information is wrong.

A smarter model won't fix any of this. It'll just be wrong more confidently.

What the research actually says

This isn't a hunch. The research community has been converging on the same conclusion for about a year now.

ETH Zurich's study on AGENTS.md files showed that structured, project-specific context files dramatically improved the accuracy of AI coding output — using the same underlying models. The delta came entirely from what was in the context window, not from which model read it.
The New Stack published "Context Is AI Coding's Real Bottleneck in 2026" documenting the same pattern across multiple tools and vendors. The industry is quietly realizing that "upgrade the model" is diminishing returns and "upgrade the context" is where the wins are hiding.
Hallucination rate gaps tell the story. Leading models hit 0.7–0.9% hallucination rates on well-grounded tasks. The industry average hovers around 9.2%. The gap between "best" and "average" isn't model capability — it's how well the context is curated.

Put another way: if context quality were held constant, most of the gap between GPT-5 and GPT-5.4 — or between Claude 4 and Opus 4.6 — would disappear. The gains developers attribute to new models are largely gains from better default prompts, better retrieval, and better system instructions that ship alongside them.

The three context failures

When an AI coding assistant gives you wrong code, the root cause is almost always one of three context problems:

Missing context. The model was never shown the library's docs at all. It's guessing from pattern similarity with other libraries. Confident, plausible, and wrong — because it's literally making the API up by analogy.
Stale context. The model was trained on v3 of a library, you're on v6, and nobody told it. It knows an API; it just knows the wrong one. This is the most common failure mode for anything that ships faster than model training cycles (which is most things).
Noisy context. The model has too much information, not too little. You dumped 200KB of docs into the context window and the signal for your specific question drowned. The relevant paragraph was there — buried under everything else.

Here's the uncomfortable part: all three of these get worse, not better, as context windows grow. A million-token context window doesn't fix missing docs. It doesn't un-stale training data. And it actively encourages the noisy-context failure by giving teams permission to throw everything at the model and hope.

The fix isn't a bigger pipe. It's a cleaner one.

Fixing context at the source

If context quality is the lever, the question becomes: what does a good context pipeline look like? Three properties matter, and they compound:

Version-specific, not latest-only. The assistant needs docs for your version, not the most recent release. A cloud doc service that indexes HEAD is useless if you're pinned to react@18. Versioning has to be first-class, not an afterthought.
Local-first, not network-bound. If retrieving docs takes 300ms over the network, the agent starts skipping retrievals for "simple" questions. Sub-10ms local lookups mean retrieval is always on, for every question, even the trivial ones. Latency determines behavior.
Pre-indexed, not lazily scraped. On-the-fly scraping is fragile — sites rate-limit, pages move, layouts change. Pre-built packages that ship the parsed, structured docs eliminate an entire class of flakiness.

This is the thesis behind @neuledge/context: an MCP documentation server that gives your AI assistant accurate, version-pinned library docs from a local SQLite database. It isn't magic — it's the boring answer to what a fixed context pipeline looks like. Version-specific packages, sub-10ms local retrieval, and a community registry of 116+ pre-built packages so you don't build anything from source unless you want to.

MCP (Model Context Protocol) matters here because it's the standard interface that lets any coding assistant — Claude Code, Cursor, Continue, and a growing list of others — plug into the same documentation source. Fix your context pipeline once and every tool on your machine gets the benefit. No per-editor integration, no vendor lock-in.

The point isn't "use this tool." The point is that the context problem has concrete, fixable causes, and you should use something that addresses them. Several tools in this space exist. Pick one. What you don't want to do is keep waiting for the next model release to fix a problem the model never caused.

Before and after

The difference is easier to see than to describe. Take a question almost every React developer has asked an AI assistant in the last year:

"How do I fetch data in a React Server Component with suspense?"

Without good context, a typical assistant reaches for training data. You get code that looks right at a glance — use client, useEffect, a loading state — except React Server Components don't use useEffect. That's a Client Component pattern from the pre–RSC era. The assistant mixed two React paradigms because both are in its training data and neither was labeled as "wrong for this context." The answer isn't nonsense; it's just an answer from 2022.

With version-pinned React 19 docs in context, the same model gives you a Server Component that awaits the fetch directly, wrapped in a <Suspense> boundary at the parent. No use client. No useEffect. Because the actual React 19 docs say so, and the model is no longer guessing.

Same model. Same question. Different answer — because the context was different. Set it up once:

npx @neuledge/context add react@19

Your assistant now has the React 19 reference at sub-10ms local latency. The next time you ask about Server Components, it's reading the docs, not dredging them up from a 2023 blog post.

Stop upgrading models. Start upgrading context.

The model-upgrade treadmill is a comfortable place to be — there's always a new release, the benchmarks always go up, and the problem always feels like it's about to be solved. It isn't. The hallucinations you're seeing today will still be there in the next model, because they aren't reasoning failures. They're context failures wearing a reasoning failure's clothes.

The good news is that context is the easier problem. You can fix it this afternoon:

Audit what your assistant actually has in its context window. Is it version-specific? Is it fresh? Is it the docs for the libraries you actually use?
Set up a retrieval pipeline that's local, fast, and pre-indexed. @neuledge/context is one option; use whatever fits your stack.
Pick your most frustrating AI coding scenario — the one that made you blame the model last week — and try it again with real docs in context. See if the model suddenly gets smarter.

It won't have gotten smarter. It'll just finally have the information it needed the first time.

Try it with the scenario that annoyed you most this week. If it still gets the answer wrong after that, then start blaming the model. Most people never have to.

Want the hands-on version? Read Getting Started with @neuledge/context for the setup walkthrough, or browse the docs to pin your first package.

Your AI Coding Assistant Can Finally Read Django and Spring Boot Docs

Moshe Simantov — Wed, 01 Apr 2026 02:46:15 +0000

Most AI documentation tools make a quiet assumption: your library's docs are in Markdown. If they are, great. If they aren't, you're out of luck.

That's a problem, because some of the most important frameworks in software development don't use Markdown at all. Python's ecosystem standardized on reStructuredText (.rst) — Django, Flask, and most Sphinx-based projects write their docs in it. Many Java projects, including Spring Boot, use AsciiDoc (.adoc) for their reference documentation.

If your AI documentation tool can only parse Markdown, it can't index Django. It can't index Spring Boot. It's locked out of entire ecosystems.

@neuledge/context v0.3.0 fixes this with native support for all three formats.

Three formats, zero configuration

Context now parses three documentation formats:

Markdown (.md, .mdx, .qmd, .rmd) — the existing default
reStructuredText (.rst) — Python ecosystem: Django, Flask, Sphinx-based docs
AsciiDoc (.adoc) — Java ecosystem: Spring Boot, enterprise documentation

Format detection is automatic. Context reads the file extension and selects the right parser. No configuration flags, no format declarations. Point it at a repo and it figures out the rest.

This means a single repository with mixed formats — say, Markdown README files alongside .rst API reference docs — gets parsed correctly without any extra steps. Each file is handled by its extension.

Python ecosystem: Django, FastAPI, Flask

Let's walk through indexing Django's documentation. If you're using the community registry, it's one command:

context install pip/django

That downloads a pre-built package with Django's full .rst documentation already parsed, chunked, and indexed into a searchable SQLite database.

Want to see what versions are available? The registry pulls version data from PyPI's REST API automatically:

context browse pip/django

If you'd rather build from source — maybe you're tracking a development branch or using a fork:

context add https://github.com/django/django

Context will clone the repo, detect the .rst files in Django's docs/ directory, and parse them into the same indexed format. Django's documentation is extensive — hundreds of .rst files covering models, views, middleware, forms, and the admin interface. All of it becomes searchable by your AI assistant.

The same workflow works for the rest of the Python ecosystem:

FastAPI: context install pip/fastapi
Flask: context install pip/flask
Pydantic: context install pip/pydantic

Once indexed, your AI coding assistant gets version-specific, accurate answers instead of guessing from training data. Ask about Django 5.1 middleware and you get Django 5.1 middleware docs — not a hallucinated blend of version 3, 4, and 5.

Java ecosystem: Spring Boot

The AsciiDoc parser opens up Java's documentation world. Spring Boot's reference documentation is written entirely in .adoc files, and Context handles it natively:

context install maven/spring-boot

Version discovery works through Maven Central's API, so context browse maven/spring-boot shows every published version. Install the one that matches your project.

Building from source follows the same pattern:

context add https://github.com/spring-projects/spring-boot

Context detects the .adoc files and parses Spring Boot's reference docs — configuration properties, auto-configuration, actuator endpoints, and deployment guides. Instead of your AI assistant guessing at Spring Boot configuration, it can search the actual reference documentation for your exact version.

The registry also includes:

JUnit: context install maven/junit
Micrometer: context install maven/micrometer

Build from any git repo

The multi-format support isn't limited to registry packages. Any git repo with .rst or .adoc files works. This is especially useful for teams with internal documentation.

context add https://github.com/your-org/internal-docs

Context scans the repository, auto-detects formats by extension, and parses everything it finds. If your team maintains API docs in reStructuredText and architecture docs in Markdown, both get indexed correctly from the same repo.

This also works for libraries that aren't in the registry yet. Found an open-source Python library with great .rst docs? Just point Context at the repo. No need to wait for someone to add it to the registry — though if it's a popular library, consider submitting a registry entry so others can benefit too.

Why this matters

The Markdown assumption has been a blind spot for AI documentation tooling. Python has the third-largest developer community globally. Java remains the backbone of enterprise software. Excluding these ecosystems from AI documentation tools meant excluding millions of developers.

With multi-format support, @neuledge/context is no longer a JavaScript/TypeScript documentation tool. It's a documentation tool for any ecosystem that writes docs in Markdown, reStructuredText, or AsciiDoc — which covers the vast majority of open-source projects.

Your AI assistant shouldn't be limited to libraries that happen to use Markdown. Try indexing a Python or Java library and see the difference accurate, version-specific documentation makes:

npx @neuledge/context install pip/django

Browse the full community registry for pre-built packages, check the documentation for setup instructions, or explore how local-first documentation keeps everything fast and private.

RAG vs. Fine-Tuning vs. Grounding: Which One Does Your AI Actually Need?

Moshe Simantov — Sun, 15 Mar 2026 17:02:52 +0000

I've watched three teams this year burn weeks fine-tuning models that just needed access to their own docs. One spent $12K on GPU time training a customer support model to stop hallucinating product features — features that were already documented in their help center. The fix was a retrieval pipeline that took an afternoon to set up.

The problem isn't that these teams were stupid. It's that "RAG vs. fine-tuning" is the wrong question, and most content online frames it that way because the authors are selling one or the other.

Here's the actual question: what kind of wrong is your LLM being?

The misdiagnosis that costs you weeks

When an LLM gives bad output, developers reach for one of two fixes:

RAG — stuff relevant documents into the context window before generation
Fine-tuning — retrain the model on examples of correct output

Both work. Neither works for everything. And confusing when to use which leads to the most expensive mistake in AI development: solving the right problem with the wrong tool.

Fine-tuning a model to know facts is like teaching someone a foreign language by having them memorize a dictionary. They'll learn the words, but they'll still make things up when you ask about something that wasn't in the dictionary. Fine-tuning changes how a model responds — its tone, reasoning style, output format. It does not reliably teach it what is true. A fine-tuned model will confidently produce wrong facts in exactly the style you trained it on.

RAG without good retrieval is like handing someone a library card and expecting them to write a PhD thesis. Access to information isn't the same as accessing the right information. If your retrieval returns noisy, irrelevant, or stale documents, the model will dutifully weave that garbage into a polished-sounding response.

Both techniques are tools. The goal they serve is grounding — anchoring every claim the model makes to a verifiable source.

Diagnose first, then pick your tool

Match the symptom to the fix:

"It says wrong facts about our product." Your model was trained on internet data, not your docs. It doesn't know that you renamed the API in v3, deprecated the old auth flow, or added a new pricing tier last month. This is a retrieval problem. Give it access to your documentation at query time — don't try to bake every fact into model weights.

"It responds in the wrong format." You want JSON with specific fields. Or you want the model to follow your company's support tone. Or you need it to reason through multi-step problems in a specific way. This is a behavior problem. Fine-tune on examples of the format and style you want.

"It hallucinates even when I give it context." Your retrieval is returning the wrong documents, too many documents, or documents with conflicting information. This is a retrieval quality problem. Fix your chunking, your ranking, your filtering — don't add more infrastructure.

"It doesn't follow complex instructions well." The model's instruction-following capability is the bottleneck, not its knowledge. Fine-tune for reasoning patterns, AND ground it in real data so it has something accurate to reason over.

Here's the pattern: if the problem is what the model knows → grounding. If the problem is how the model behaves → fine-tuning. Most production issues are knowledge problems.

Why fine-tuning is almost never the right first step

Fine-tuning has a seductive pitch: "make the model work exactly how you want." But the costs are real and often underestimated:

Training data curation. You need hundreds to thousands of high-quality input/output examples. Someone has to write or curate those. That's weeks of work before you even start training.
Compute costs. A single fine-tuning run on a capable model runs $500–$5,000 depending on the provider, dataset size, and model. Multiple iterations are normal.
Model lock-in. When Anthropic ships Claude 4.7 or OpenAI releases GPT-5, your fine-tuned weights don't transfer. You retrain from scratch. Every model upgrade resets you to zero.
The accuracy ceiling. After all that investment, the model still can't answer questions about facts not in the training data. Your product docs changed last Tuesday? The fine-tuned model doesn't know.

Compare that to a grounding pipeline: set up retrieval, point it at your docs, done. When the docs change, the model's answers change immediately. No retraining, no dataset curation, no compute budget.

Research backs this up. RAG-based grounding reduces hallucinations by 42–68% with no model modification at all. That's the kind of improvement that makes fine-tuning an optimization for later, not a starting point.

The right sequence: ground first, measure what's still wrong, fine-tune only if the remaining problems are behavioral.

What good grounding actually looks like

Bad grounding is "dump all the docs into the prompt." Good grounding is an architecture:

1. Right data, right time. Not all data is the same. Static docs (API references, guides, policies) change per release — index them once and search locally. Live data (prices, inventory, status) changes per minute — query it at request time. Mixing these up is how you end up quoting yesterday's prices with today's confidence.

2. Selective context. Don't send 20 documents to the model. Send the 3 most relevant ones. More context means more noise for the model to latch onto. The model doesn't need your entire knowledge base — it needs the specific answer to the specific question.

3. Source traceability. Every fact the model cites should trace back to a source document with a URL, version, and timestamp. If it can't cite a source, it should say so instead of guessing.

In practice, this means two layers. For documentation and reference material, use something that indexes docs into a local, searchable store — we built @neuledge/context for this, which packages docs as SQLite databases with sub-10ms full-text search, served as an MCP server:

{
  "mcpServers": {
    "context": {
      "command": "npx",
      "args": ["-y", "@neuledge/context"]
    }
  }
}

With the community registry, you don't even need to build packages yourself — 116+ libraries are pre-built and ready to install.

For live operational data, use a semantic data layer like @neuledge/graph that queries structured sources at request time and returns clean JSON the model can reason over.

The combination covers both failure modes: stale knowledge (retrieval from indexed docs) and stale data (live queries to operational systems).

When you actually need fine-tuning

Fine-tuning isn't useless — it's just not the first thing to reach for. There are specific situations where it's the right tool:

Consistent output format. You need every response to follow a strict JSON schema, or match a specific tone, or produce a particular reasoning structure. Prompt engineering can get you 80% there, but fine-tuning locks it in.
Domain reasoning patterns. Your use case requires the model to reason through problems in a domain-specific way — medical differential diagnosis, legal contract analysis, financial risk assessment. The model needs to think differently, not just know different facts.
Efficiency at scale. You're making millions of API calls and a fine-tuned smaller model could replace a larger one with enough quality for your use case. This is a cost optimization, not an accuracy play.

The common thread: fine-tuning changes behavior, not knowledge. If you fine-tune AND ground, you get a model that reasons the way you want about facts that are actually true. That's the combination that production systems eventually land on — but grounding comes first because it solves the bigger, more common problem.

The bottom line

Stop asking "RAG or fine-tuning?" and start asking "what's actually wrong?"

Wrong facts → ground it. Wrong behavior → fine-tune it. Wrong everything → ground first, then fine-tune, because a model that behaves perfectly while confidently lying is worse than one that's awkwardly correct.

Get started: Install @neuledge/context for documentation grounding and @neuledge/graph for live data. Both are free, open source, and work with any MCP-compatible AI agent. The getting started guide walks through the full setup.

116 Pre-Built Documentation Packages for Your AI Coding Assistant

Moshe Simantov — Sat, 07 Mar 2026 14:17:53 +0000

Every time someone set up @neuledge/context for a new project, they'd do the same thing: clone the React docs repo, find the right directory, build a package. Then do it again for Next.js. And Tailwind. And Prisma.

I kept seeing the same repos show up in GitHub traffic. Hundreds of developers, all independently building identical documentation packages for the same popular libraries. That felt like a problem worth solving.

So we built a community registry — a shared collection of pre-built documentation packages that anyone can download instead of building from source.

The problem was simple repetition

The context add workflow works great. You point it at a repo, it finds the docs, builds a searchable SQLite database. Done.

context add --name react https://github.com/reactjs/react.dev /src/content/reference

But for popular libraries, this is redundant work. You need to know the right repo URL, find the correct docs directory (which sometimes takes a few minutes of browsing), and wait for the build. Multiply that by every developer who uses React, and it's a lot of collective time spent producing the exact same .db file.

The registry just short-circuits that. Someone builds the package once, and everyone else downloads it.

What's actually in a registry package

Same thing you'd get building locally — a SQLite .db file with FTS5 full-text search, containing semantically chunked documentation. There's no difference between a package you build yourself and one from the registry. Same format, same search quality, same everything.

Right now the registry has packages for three ecosystems:

npm (109 packages): React, Next.js, Angular, Vue, Svelte, Astro, Tailwind CSS, Express, Fastify, NestJS, Prisma, Drizzle, and a lot more
pip (4 packages): Django, FastAPI, Flask, Pydantic
maven (3 packages): Spring Boot, JUnit, Micrometer

Packages get rebuilt daily through GitHub Actions. When Next.js ships a new version, the registry picks it up automatically. No one needs to do anything.

How to use it

The simplest way is context install:

context install npm/react

That downloads the pre-built package and makes it available to your AI coding assistant immediately. If you want a specific version:

context browse npm/react        # see what's available
context install npm/next 15.0   # install a specific version

Or with npx if you haven't installed @neuledge/context globally:

npx @neuledge/context install npm/react

If you're running @neuledge/context as an MCP server, your AI agent can also find and install packages on its own. It has two tools for this — search_packages to find what's available, and download_package to install it. So if it encounters a library it doesn't have docs for, it can just go grab them from the registry without you doing anything.

How the registry works

The pipeline is pretty straightforward:

Registry entries are YAML files that map a package name to a git repo and docs path. Each one says "for this library, clone this repo, look in this directory."
A daily GitHub Actions workflow checks for new library versions. When it finds one, it clones the repo, builds the documentation package, and publishes it to the registry API.
The API at api.context.neuledge.com serves search and download endpoints. Search to find packages, download to get the .db file.
The packages themselves are the same SQLite databases @neuledge/context uses locally — meta table for metadata, chunks table for the documentation sections, and chunks_fts for full-text search.

If you want to add a library that's missing, you submit a YAML file to the GitHub repo with the package mapping. The build pipeline handles everything else from there.

What's covered

A quick overview of the categories:

Frontend: React, Next.js, Angular, Vue, Svelte, SvelteKit, Astro, Solid, Remix, Nuxt, Gatsby

CSS: Tailwind CSS, Sass, PostCSS, Styled Components, Emotion

Backend: Express, Fastify, NestJS, Hono, Django, FastAPI, Flask, Spring Boot

Database/ORM: Prisma, Drizzle, TypeORM, Mongoose, Sequelize, Knex

Testing: Jest, Vitest, Playwright, Cypress, Testing Library

AI SDKs: OpenAI SDK, Anthropic SDK, LangChain, Vercel AI SDK

Build tools: Vite, Webpack, esbuild, Turbo, Bun, Deno

Infra: Docker, Kubernetes, Terraform, AWS CDK

If you're working with a typical stack — say React, Next.js, Prisma, and Tailwind — that's four install commands and your AI assistant has accurate, version-specific docs for everything.

Try it out

npx @neuledge/context install npm/react

Or set up @neuledge/context as an MCP server and let your AI agent discover packages on its own. The getting started guide walks through the full setup.

The registry is open source and free. If your favorite library isn't there yet, adding it is one YAML file.

How @neuledge/graph Gives AI Agents Access to Live Data

Moshe Simantov — Sun, 22 Feb 2026 15:03:31 +0000

Your customer asks the AI agent: "What's the current price for the Pro plan?" The agent responds with $29/month — the price from six months ago. You raised it to $39 in January. Yesterday the same agent told a prospect you have 200 units in stock. You have 12.

These aren't hallucinations in the traditional sense. The model isn't making things up from nothing — it's answering from stale training data because it has no connection to your live systems. Prices change, inventory moves, statuses update. If your AI agent can't access current data, it will confidently serve outdated facts. And outdated facts are often worse than no facts at all.

RAG solves this for documentation. But structured operational data — prices, inventory, order statuses — needs a different approach.

Why RAG alone isn't enough for live data

RAG was designed for documents. It chunks text, embeds it into vectors, and retrieves relevant passages. That works well for documentation, knowledge bases, and guides. But it breaks down with live operational data for three reasons:

Documents vs. structured data. RAG returns text fragments. When an agent needs the current price of SKU-1234, it needs a number — not a paragraph that might contain a number from last week's catalog export.
Staleness matters more. Documentation might be acceptable at a week old. Pricing data is wrong after an hour. Inventory counts are wrong after a minute. Different data types have fundamentally different freshness requirements.
Too many API tools creates a selection problem. The alternative to RAG — giving your agent direct API access — means exposing 10 or 20 tools. The LLM has to pick the right endpoint, format the right parameters, and parse the response. This is fragile, error-prone, and gets worse as you add more data sources.

You need something between "embed everything into vectors" and "give the agent raw API access." A unified data layer that handles routing, caching, and structured responses.

The Graph approach — one tool, all your data

@neuledge/graph is a semantic data layer for AI agents. Instead of giving your agent many API tools to choose from, Graph provides a single lookup() tool. The agent describes what it needs in natural language; Graph routes it to the right data source and returns structured JSON.

The core idea:

Connect your data sources — APIs, databases, or any structured data endpoint
Expose a single lookup tool — the agent calls lookup() with a natural language query
Get structured JSON back — not free text, but exact values the LLM can reason over

Responses are pre-cached and return in under 100ms. The LLM doesn't wait for upstream APIs during a conversation — Graph handles that in the background.

Setting up Graph

Install Graph and its peer dependency:

npm install @neuledge/graph zod

Initialize the client:

import { NeuledgeGraph } from "@neuledge/graph";

const graph = new NeuledgeGraph();

That's the minimal setup. Graph connects to the Neuledge knowledge graph service, which provides access to structured data sources. No API key is required for basic usage (100 requests/day).

For production workloads, sign up for a free API key:

npx @neuledge/graph sign-up your-email@example.com

Then configure it:

import "dotenv/config";

const graph = new NeuledgeGraph({
  apiKey: process.env.NEULEDGE_API_KEY, // 10,000 requests/month
});

Querying data

The lookup() method is the single interface your agent uses:

const result = await graph.lookup({ query: "cities.tokyo.weather" });
// Returns: { status: "matched", match: {...}, value: {...} }

Responses come back as structured JSON — not free text. The LLM can reason over exact values (prices, counts, statuses) instead of parsing unstructured paragraphs.

Connecting Graph to your AI agent

Graph is designed as a first-class tool for AI agent frameworks. You pass graph.lookup directly as a tool — no wrapper code needed.

Vercel AI SDK

import { anthropic } from "@ai-sdk/anthropic";
import { NeuledgeGraph } from "@neuledge/graph";
import { stepCountIs, ToolLoopAgent, tool } from "ai";

const graph = new NeuledgeGraph();

const agent = new ToolLoopAgent({
  model: anthropic("claude-sonnet-4-5"),
  tools: { lookup: tool(graph.lookup) },
  stopWhen: stepCountIs(20),
});

const { textStream } = await agent.stream({
  prompt: "What's the current weather in Tokyo?",
});

OpenAI Agents SDK

import { Agent, run, tool } from "@openai/agents";
import { NeuledgeGraph } from "@neuledge/graph";

const graph = new NeuledgeGraph();

const agent = new Agent({
  name: "Data Assistant",
  model: "gpt-4.1",
  tools: [tool(graph.lookup)],
});

const result = await run(agent, "What is the current price of Apple stock?");

LangChain

import { NeuledgeGraph } from "@neuledge/graph";
import { createAgent, tool } from "langchain";

const graph = new NeuledgeGraph();
const lookup = tool(graph.lookup, graph.lookup);

const agent = createAgent({
  model: "openai:gpt-4.1",
  tools: [lookup],
});

const result = await agent.invoke({
  messages: [
    { role: "user", content: "What is the exchange rate from USD to EUR?" },
  ],
});

The pattern is the same across frameworks: create a NeuledgeGraph instance, pass graph.lookup as a tool, and let the agent call it when it needs live data.

What the agent experience looks like

When your agent has Graph connected, conversations with live data queries look like this:

User: "What's the current weather in San Francisco?"

The agent calls graph.lookup({ query: "cities.san-francisco.weather" }) and gets back structured JSON:

{
  "status": "matched",
  "value": {
    "temperature": 62,
    "unit": "fahrenheit",
    "condition": "partly cloudy",
    "humidity": 68,
    "updated_at": "2026-02-22T14:30:00Z"
  }
}

The agent sees exact numbers, not prose. It can tell the user the temperature is 62°F with 68% humidity — not "approximately in the low 60s" based on historical averages from training data.

This structured format matters. LLMs reason more accurately over explicit values than extracted text. A JSON response with "price": 39.00 is unambiguous. A paragraph that says "the price was recently updated to around $39" leaves room for the model to hedge, round, or misinterpret.

Building a custom data server

For proprietary data sources — your product catalog, internal pricing API, inventory system — you can run your own Graph server:

import { NeuledgeGraphRouter } from "@neuledge/graph-router";
import { NeuledgeGraphMemoryRegistry } from "@neuledge/graph-memory-registry";
import { openai } from "ai";
import Fastify from "fastify";

const registry = new NeuledgeGraphMemoryRegistry({
  model: openai.embedding("text-embedding-3-small"),
});

// Register your data sources
await registry.register({
  template: "products.{sku}.price",
  resolver: async (match) => {
    const sku = match.params.sku;
    const response = await fetch(
      `https://api.internal/pricing?sku=${sku}`
    );
    return response.json();
  },
});

await registry.register({
  template: "inventory.{sku}.stock",
  resolver: async (match) => {
    const sku = match.params.sku;
    const response = await fetch(
      `https://api.internal/inventory?sku=${sku}`
    );
    return response.json();
  },
});

const router = new NeuledgeGraphRouter({ registry });
const app = Fastify();

app.post("/lookup", async (request, reply) => {
  const result = await router.lookup(request.body);
  return reply.send(result);
});

app.listen({ port: 3000 });

Then point your Graph client at the custom server:

const graph = new NeuledgeGraph({
  baseUrl: "http://localhost:3000",
});

This gives you full control over data sources, caching, and access policies while keeping the same lookup() interface for your AI agents.

Graph + Context — the full grounding stack

Graph handles live operational data. But AI agents also need static knowledge — library docs, API references, guides. That's where @neuledge/context comes in.

The two tools complement each other:

Context grounds your agent in static documentation — library docs, internal wikis, API references. Indexes into SQLite, serves via MCP, sub-10ms queries. Best for knowledge that changes with releases.
Graph grounds your agent in live data — product catalogs, pricing, inventory, system status. Pre-cached structured responses, single lookup tool. Best for data that changes continuously.

An AI coding assistant uses Context for accurate, version-specific documentation. A customer-facing agent uses Graph for current prices and availability. A sophisticated agent uses both — docs for how-to knowledge, Graph for current facts.

Together, they form the grounding architecture that eliminates the most common categories of hallucination: outdated documentation and stale operational data.

Get started

Install Graph and connect it to your agent:

npm install @neuledge/graph zod

import { NeuledgeGraph } from "@neuledge/graph";

const graph = new NeuledgeGraph();
const result = await graph.lookup({ query: "your-data-query-here" });

For production usage, sign up for a free API key to get 10,000 requests/month. For proprietary data, set up a custom server with @neuledge/graph-router.

Your AI agent should answer from your data, not from six-month-old training patterns. Ground it.

GitHub repo — source code, API reference, and examples
Documentation — quick start guide and configuration
Getting started with Context — complement Graph with documentation grounding
What is LLM grounding? — the concept behind tools like Graph and Context

What Is LLM Grounding? A Developer's Guide

Moshe Simantov — Fri, 20 Feb 2026 21:55:49 +0000

Ask an AI coding assistant to use the useAgent hook from Vercel's AI SDK. If the model was trained before v6 shipped, you'll get a confident answer referencing Experimental_Agent — an API that was renamed months ago. The code looks right. The types look right. It's wrong.

This is what happens when a language model has no connection to current reality. LLMs are powerful pattern matchers trained on internet snapshots. They have no access to your docs, your APIs, or your data. When they lack information, they fill the gap with plausible-sounding fiction. This isn't a bug — it's a fundamental limitation of how these models work. Researchers call it "hallucination," but that implies randomness. In practice, it's worse: the model generates answers that are structurally correct but factually outdated, and there's nothing in the output that tells you which parts are real.

Grounding is the architectural solution. Instead of hoping the training data is current enough, you connect the model to real data sources at the time it generates a response. The result: answers based on facts, not patterns.

What is LLM grounding?

LLM grounding is the process of connecting a language model to external data sources at inference time, so it can retrieve and reason over real information instead of relying solely on its training data.

It's not a single technique — it's an umbrella term for a category of approaches:

Retrieval-Augmented Generation (RAG) — fetching relevant documents before generation. The model searches a knowledge base, retrieves matching content, and uses it as context for its response.
Tool use / function calling — letting the model query APIs, databases, or services directly. Instead of guessing a price, it calls a pricing API.
Knowledge retrieval — structured access to specific facts through knowledge graphs, lookup tables, or semantic search indexes. Not just document chunks, but precise answers.

Grounding is the goal. RAG, tool use, and knowledge retrieval are techniques to achieve it. Most production systems combine more than one.

Types of grounding by data source

Not all grounding is the same. Different data sources have different characteristics, and the right approach depends on what kind of data your model needs.

Static documentation — library docs, API references, internal guides. Changes infrequently (per release cycle). Best approach: index locally, serve via search. Full-text search or vector embeddings work well here because the content is stable enough to pre-index. Read more about local-first documentation for a deep dive on this approach.

Live operational data — prices, inventory, system status, feature flags. Changes continuously (hours, minutes, or seconds). Best approach: query via API or database at request time with appropriate cache TTLs. RAG doesn't work well here because by the time you've embedded and indexed the data, it's already stale. A customer-facing agent quoting yesterday's prices is worse than quoting no price at all.

Structured knowledge — facts, relationships, taxonomies, entity data. Best approach: knowledge graphs or semantic lookup tools that return structured JSON rather than document fragments. When the model needs "the current price of SKU-1234," it needs a number, not a paragraph that might contain a number.

The distinction matters because mixing up approaches creates subtle failures. Embedding live pricing data into a vector database gives you yesterday's prices with today's confidence. Querying a documentation API in real-time adds latency and fragility where a local index would be instant and reliable.

The grounding architecture

When an AI agent receives a query, a grounded system follows a consistent pattern:

The agent identifies what external data it needs. Is this a question about API usage (docs), current pricing (live data), or general knowledge (training data is fine)?
The retrieval layer fetches relevant context. This could be a local doc search, an API call, a database query — or all three.
Context is injected into the prompt alongside the user's query. The model now has both the question and the facts needed to answer it.
The model generates a response grounded in retrieved facts rather than training-time patterns.
(Optional) A verification layer checks the output against the sources to catch remaining hallucinations.

This is sometimes called a "grounding pipeline" — and it's the core architecture behind AI agents that don't hallucinate. The specifics vary (what retrieval systems you use, how you compose the prompt, whether you add verification), but the pattern is consistent.

The key insight: grounding is an architectural concern, not a prompt engineering trick. You can't reliably ground a model by telling it "only use facts." You need infrastructure that provides those facts.

Notice that step 1 is the hardest. Knowing when to retrieve and what to retrieve requires understanding the query's intent. A question about "how to configure authentication" needs docs. A question about "what's the current subscription price" needs live data. A good grounding system handles this routing automatically — the agent doesn't need to know the implementation details of each data source.

Grounding vs. fine-tuning

This is a common source of confusion. Fine-tuning and grounding solve different problems:

Fine-tuning changes the model's behavior — its tone, reasoning style, domain vocabulary, output format. You're adjusting how it thinks by training on task-specific examples. But the facts it knows still come from training data. Fine-tuning a model on medical terminology doesn't keep it current on drug interactions.

Grounding changes the model's information at query time. You're giving it access to current facts without modifying the model itself. The model's behavior stays the same, but its answers reflect real data instead of training patterns.

The decision framework is straightforward:

Need factual accuracy about things that change? Use grounding. Current docs, live data, version-specific APIs — grounding handles these because it provides facts at inference time.
Need the model to behave differently? Use fine-tuning. Domain-specific output formats, specialized reasoning patterns, company tone — fine-tuning handles these because they're behavioral.
Building a production system? You probably need both. Fine-tune for behavior, ground for facts.

Fine-tuning without grounding gives you a model that sounds like a domain expert but still hallucinates about current data. Grounding without fine-tuning gives you accurate facts delivered in a generic style. The combination is where production systems land.

Grounding in practice with MCP

The Model Context Protocol (MCP) makes grounding practical by standardizing how AI agents connect to external data sources. Instead of building custom integrations for every model and every data source, MCP defines a common interface: data sources expose "tools" through MCP servers, and AI agents query them through a standard protocol.

This matters for grounding because it means you can compose multiple grounding sources without custom integration code. A coding assistant can pull library docs from one MCP server and live API data from another — same protocol, same agent, different data. And because MCP is an open standard, you're not locked into any particular vendor or model provider.

Here's what a practical grounding setup looks like with MCP:

{
  "mcpServers": {
    "context": {
      "command": "npx",
      "args": ["-y", "@neuledge/context"]
    }
  }
}

This configuration gives your AI agent access to local documentation through @neuledge/context — a tool that indexes library docs into local SQLite databases and serves them via MCP. The agent gets version-specific documentation with sub-10ms queries, no cloud dependency, and no rate limits.

For live data grounding, @neuledge/graph provides a semantic data layer that connects agents to operational data sources — pricing APIs, inventory systems, databases — through a single lookup() tool with pre-cached responses and structured JSON output.

The combination covers both grounding categories: static documentation via Context, live operational data via Graph. Both run locally, both expose tools through MCP, and both work with any AI agent that supports the protocol. Check the integrations page for setup guides with Claude Code, Cursor, Windsurf, and other editors.

Getting started

Start with the type of grounding that matches your biggest pain point:

Your AI keeps using wrong API versions → Ground it with local documentation. Index the docs for the exact versions you use, serve them to your assistant via MCP.
Your AI needs live data (prices, statuses, inventory) → Ground it with a data layer. Connect your operational APIs and let the agent query structured facts instead of guessing.
Your AI hallucinates entirely → Read the hallucination prevention architecture guide for the full four-layer approach.

Grounding isn't optional for production AI systems. Research shows that RAG-based grounding alone reduces hallucinations by 42–68%, and combining grounding with verification can push accuracy even higher. An ungrounded agent is a liability — it will confidently deliver wrong answers that look right. A grounded agent is a tool — it delivers answers based on your data, your docs, your reality.

Start grounding your LLM today:

Install @neuledge/context for documentation grounding
Install @neuledge/graph for live data grounding
Read the docs and integration guides for setup with your editor
Compare grounding tools to understand your options

Local-First Documentation: What It Is and Why Your AI Agent Needs It

Moshe Simantov — Thu, 19 Feb 2026 02:27:08 +0000

You're mid-session with your AI coding assistant. It's been writing solid code for the last twenty minutes — referencing the right framework APIs, using current patterns. Then it starts hallucinating. The cloud documentation service hit its rate limit, and your assistant fell back to its training data. Now it's confidently suggesting APIs that were deprecated two versions ago.

This is the fundamental reliability problem with cloud-based documentation for AI agents. Local-first documentation solves it.

What is local-first documentation?

Local-first documentation means indexing library docs into a local database and serving them to your AI agent without any network calls. Instead of your assistant querying a cloud API every time it needs to reference a framework, it reads from a file on your machine.

The concept borrows from the broader local-first software movement: your data lives on your device, works offline, and doesn't depend on someone else's server being up. Applied to AI documentation, it means:

Docs are stored locally — typically as a SQLite database or similar portable format
Queries never leave your machine — sub-10ms lookups instead of 100–500ms cloud round-trips
No network dependency — works on a plane, in an air-gapped environment, or when your Wi-Fi drops
You control the version — index docs for the exact library version you're using

This isn't a new idea for developer tools. DevDocs, Zeal, and Dash have offered offline documentation browsing for years. What's new is applying this architecture to AI agents — giving your coding assistant the same offline, instant, version-accurate access to docs that you'd want for yourself.

The problem with cloud documentation services

Cloud documentation services solve a real problem: AI coding assistants need access to current docs that aren't in their training data. Services like Context7 provide this by hosting documentation and serving it through an API.

But cloud-first architecture introduces its own failure modes:

Rate limits cut you off mid-session. Most services cap requests at 60 per hour. A single complex coding session can burn through that in minutes, especially with agentic workflows where the AI makes dozens of tool calls. Once you hit the limit, your assistant loses access to docs entirely.
Latency adds up. Each cloud lookup takes 100–500ms. In a session with 30+ doc queries, that's 3–15 seconds of accumulated waiting — enough to noticeably slow down an interactive coding session.
Version mismatch. Most cloud services index only the latest version of a library. If your project is pinned to Next.js 15 but the service indexed Next.js 16, every answer references the wrong API. The version lag cuts both ways — if you're on the latest and the service is behind, you still get wrong answers.
Privacy exposure. Every query goes to a third-party server. For teams working with proprietary codebases, internal APIs, or sensitive project structures, that's a non-trivial concern. The queries themselves reveal what you're building and what you're struggling with.
Cost scales with usage. Free tiers have tight limits. Paid plans charge per query or per month. For teams with multiple developers using AI assistants heavily, costs compound.

None of these are deal-breakers for casual use. If you're prototyping something quick and always-latest docs are fine, cloud services work. The problems surface when reliability and accuracy matter — production codebases, version-pinned dependencies, teams that can't afford their AI assistant going dark mid-session.

Why local-first is a better fit for AI agents

AI agents have different access patterns than human developers browsing docs. A developer might look up a few API references per hour. An AI agent in an agentic coding session might query docs 50+ times in a single task — checking types, verifying method signatures, reading examples for each file it touches.

This high-frequency access pattern is exactly where local-first shines:

No rate limits, ever. Your agent can query docs hundreds of times per session. The database is a file on disk — there's no server to throttle you.
Sub-10ms latency. SQLite queries against a local FTS5 index return in under 10 milliseconds. That's fast enough that doc lookups add zero perceptible delay to your coding session.
Version pinning. Index docs for the exact Git tag your project uses. When you're on ai@6.0.86, you get v6 docs — not a blend of every version that existed at training time, and not whatever "latest" the cloud service indexed.
Works everywhere. Airplane mode, air-gapped networks, coffee shop Wi-Fi that drops every five minutes. Once the docs are indexed locally, your AI never loses access.
Free and unlimited. No per-query pricing, no monthly subscriptions, no tier limits. Index as many libraries as you need, query as often as you want.
Private by default. Your queries stay on your machine. No third party sees what APIs you're looking up, what frameworks you're using, or what internal docs you've indexed.

How local-first documentation works

The architecture is straightforward:

Point at a source. Give the tool a Git repository URL (or a local directory). It clones the repo's docs — typically Markdown files in a /docs folder.
Pick a version. Select the exact Git tag or branch you want. This is what makes version pinning possible.
Index into a local database. The tool parses documentation into semantically chunked sections and indexes them with full-text search (FTS5 + BM25 ranking) into a portable SQLite .db file.
Serve via MCP. The tool starts a local Model Context Protocol server. Your AI coding assistant — Claude Code, Cursor, VS Code Copilot, Windsurf — connects to it and queries docs through the standard MCP protocol.

The result: your AI assistant asks "How do I create middleware in Next.js?" and gets an answer from the exact version of Next.js docs you indexed, in under 10ms, without touching the internet.

@neuledge/context implements this architecture. Three commands to set up:

npm install -g @neuledge/context
context add https://github.com/vercel/next.js --tag v16.0.0
context mcp

The .db files are portable — check them into your repo or share them on a drive. Every developer on your team gets the same indexed docs with zero setup.

Local-first vs. cloud documentation: when to use each

	Local-First	Cloud
Rate limits	None	60 req/hour typical
Latency	<10ms	100–500ms
Offline	Yes	No
Version pinning	Exact tags	Latest only
Privacy	100% local	Cloud-processed
Cost	Free	$10+/month
Setup	3 commands	API key + config
Internal docs	Yes, free	Paid or unsupported

Use local-first when:

You're working on a production codebase pinned to specific dependency versions
You're in an offline or air-gapped environment
Privacy matters — proprietary code, internal APIs, sensitive projects
Your AI workflow is agentic (high-frequency doc queries that would hit rate limits)
You want to index internal documentation alongside open-source libraries

Use cloud when:

You're prototyping and always-latest docs are acceptable
You want zero-setup, zero-install convenience
Your AI usage is light enough that rate limits don't matter

Both approaches have their place. Cloud services offer convenience for light use. Local-first offers reliability and accuracy when it counts.

Get started

If your AI coding assistant keeps hitting rate limits, suggesting deprecated APIs, or losing access to docs mid-session, local-first documentation fixes all three:

npm install -g @neuledge/context
context add https://github.com/vercel/next.js
claude mcp add context -- npx @neuledge/context mcp

Getting started tutorial — full setup walkthrough with editor integration
Product page — features, architecture, and comparison table
Compare alternatives — cloud services vs. local-first documentation
Documentation — quick start and CLI reference
GitHub repo — source, issues, and contributions

Version-Specific Documentation: Why Your AI Coding Assistant Gets It Wrong

Moshe Simantov — Wed, 18 Feb 2026 03:55:26 +0000

Your AI assistant just built you an agent. Clean code, right structure, reasonable-looking tool definitions. You run it — nothing works. An hour later, you discover that Experimental_Agent was renamed to ToolLoopAgent in the AI SDK version you're using. And system is now instructions. And parameters is now inputSchema.

The error message didn't say any of that. It just said something vague about undefined properties. So you spent an hour debugging your agent's logic when the bug was actually a renamed class.

This happens constantly. And it's not a hallucination — your assistant generated code that was correct. In AI SDK v5. You're on v6.

It's not the model. It's the data.

Your assistant doesn't know what version you're running. Here's why the suggestions are wrong:

Training data blends every version together. The AI SDK shipped three major architectural shifts from v3 to v6. All of them are in training data, mixed together with no version labels. Your assistant learned patterns from all three eras at once.
Cloud doc services index the latest version. If you're pinned to ai@5.x but the service indexed v6, every answer you get is from the wrong API. It works the other way too — if you're on v6 but the service is behind, you still get v5 answers.
Blog posts and tutorials don't say which version they're for. A 2025 post using generateObject looks identical to a 2026 post using the new generateText + output pattern. generateObject was deprecated in v6 and your assistant has no way to know that from its training data alone.

The AI SDK is a perfect case study

The agent pattern changed with every major release. Ask your assistant: "How do I build a multi-step agent that calls tools?"

v3/v4: Write a manual loop — generateText with maxSteps, manage each step yourself
v5: new Experimental_Agent({ system: "...", tools }) — note the experimental prefix
v6: new ToolLoopAgent({ instructions: "...", tools }) — stable, renamed, different param name

Three correct answers, three incompatible versions. Without knowing which you're on, your assistant picks one and gets it wrong two-thirds of the time.

The rest of the API has the same problem:

parameters / result → inputSchema / output — tool definitions changed shape; old field names fail silently with no error
generateObject deprecated — still runs, but warns; breaks in the next major version
useChat append() → sendMessage() — plus the hook now expects you to manage input state yourself

Silent failures are the cruelest kind. When a renamed field breaks your tool calls without throwing an error, you debug your agent's behavior for an hour before you even think to check the SDK changelog.

The fix: docs pinned to your actual version

Your assistant gives the right answer when it has the right docs. Without version-pinned docs, it generates this:

// what your assistant produces from its training mix
const agent = new Experimental_Agent({
  system: "You are a helpful assistant.",
  tools: { search: { parameters: z.object({ query: z.string() }), ... } },
});

With v6 docs indexed, it generates this instead:

// what it produces with AI SDK v6.0.86 docs
const agent = new ToolLoopAgent({
  instructions: "You are a helpful assistant.",
  tools: { search: tool({ inputSchema: z.object({ query: z.string() }), ... }) },
});

Same question. Right answer, because it's reading from the right docs.

@neuledge/context indexes docs from a specific Git tag and serves them to your AI assistant via MCP. Two commands:

context add https://github.com/vercel/ai --tag v6.0.86
npx @neuledge/context mcp

After that, when you ask about building an agent, your assistant reads v6 docs and generates ToolLoopAgent with instructions and inputSchema — not whatever blend of versions it was trained on.

Works for any fast-moving library. React Router. Next.js App Router. Tailwind CSS. Anything where the API today isn't the API from a year ago.

See our step-by-step tutorial for editor setup (Claude Code, Cursor, VS Code, Windsurf).

What about cloud documentation services?

They solve a real problem — zero-setup access to docs your assistant wouldn't have otherwise. But most serve only the latest version. If you're on v5 and the service indexed v6, you still get the wrong answers. The version lag cuts both ways.

For production codebases pinned to a specific version, local version-pinned docs are the cleaner solution. See our comparison page for the full breakdown.

Get started

If your AI-generated agentic code keeps using old patterns — Experimental_Agent when ToolLoopAgent exists, parameters when the field is now inputSchema — three commands fix it:

npm install -g @neuledge/context
context add https://github.com/vercel/ai --tag v6.0.86
claude mcp add context -- npx @neuledge/context mcp

Getting started tutorial — full setup walkthrough
Documentation — quick start and CLI reference
Compare alternatives — cloud services vs local version-pinned docs
GitHub repo — source, issues, and CLI reference

Getting Started with @neuledge/context

Moshe Simantov — Tue, 17 Feb 2026 03:25:54 +0000

Your AI coding assistant just suggested getServerSideProps for your Next.js 16 project. That API was deprecated two major versions ago. Yesterday it generated Tailwind classes that don't exist. Last week it used the old AI SDK callback pattern instead of the new agent loop API.

This isn't a model problem — it's a data problem. Your assistant is working from training data that's months or years out of date. When it doesn't have the right docs, it fills the gap with confident-sounding fiction.

@neuledge/context fixes this. It indexes library documentation into local SQLite files and serves them to your AI assistant via MCP (Model Context Protocol). No cloud service, no rate limits, sub-10ms queries. This tutorial walks you through setting it up from scratch with a real project.

Prerequisites

Before you start, you'll need:

Node.js 18+ — check with node --version
An AI coding assistant that supports MCP — Claude Code, Cursor, VS Code with Copilot, Windsurf, or any MCP-compatible client
A project you're working on — we'll use a Next.js + Tailwind CSS stack as our example, but Context works with any library that has Markdown docs

Installing Context

Install Context globally so it's available across all your projects:

npm install -g @neuledge/context

This installs the context CLI tool. There's no background daemon, no system service — just a command-line tool that runs when you call it.

If you prefer not to install globally, you can use npx instead. Every command in this tutorial works with the npx @neuledge/context prefix:

npx @neuledge/context --version

Adding your first library

Let's index the Next.js documentation. Point Context at the GitHub repo:

context add https://github.com/vercel/next.js

Context will:

Shallow-clone the repo — only the docs, not the full git history
Show available version tags — pick the version that matches your project (e.g., v16.0.0)
Detect the docs directory — it scans for docs/, documentation/, or doc/ automatically
Parse every Markdown file — extracting frontmatter, splitting content into semantically meaningful chunks by H2 headings (~800 tokens per chunk)
Index into SQLite — full-text search with FTS5 and BM25 ranking, stored in a single .db file

The result is a portable database file at ~/.context/packages/nextjs@16.0.0.db. That file contains every piece of Next.js 16 documentation, pre-indexed and ready for instant queries.

Want to pin a specific version without the interactive prompt? Use the --tag flag:

context add https://github.com/vercel/next.js --tag v16.0.0

Adding multiple libraries

A real project uses more than one library. Let's add Tailwind CSS:

context add https://github.com/tailwindlabs/tailwindcss

Pick the version tag that matches your project, and Context creates another .db file. Each library gets its own database — clean, isolated, and independently updatable.

To see everything you've indexed:

context list

You'll see something like:

nextjs@16.0.0          ~/.context/packages/nextjs@16.0.0.db
tailwindcss@4.0.0      ~/.context/packages/tailwindcss@4.0.0.db

Add as many libraries as your project uses. The Vercel AI SDK, React, your component library — if it has Markdown docs in a Git repo, Context can index it.

Connecting to your editor

Context serves docs via MCP — the Model Context Protocol, an open standard backed by Anthropic, OpenAI, Google, and Microsoft. Here's how to connect it to your editor.

Claude Code

One command:

claude mcp add context -- npx @neuledge/context mcp

Cursor

Create .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "context": {
      "command": "npx",
      "args": ["@neuledge/context", "mcp"]
    }
  }
}

VS Code / Copilot

Add to .vscode/settings.json:

{
  "mcp": {
    "servers": {
      "context": {
        "command": "npx",
        "args": ["@neuledge/context", "mcp"]
      }
    }
  }
}

Requires VS Code 1.99+ with the GitHub Copilot extension.

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "context": {
      "command": "npx",
      "args": ["@neuledge/context", "mcp"]
    }
  }
}

For other MCP clients, the server command is npx @neuledge/context mcp using stdio transport. See our integrations page for the full list.

Using Context in practice

Once connected, your AI assistant automatically has access to the resolve tool — it can search your indexed docs whenever it needs accurate information.

Here's the difference in action. Say you ask your assistant: "How do I create a middleware in Next.js 16 that redirects unauthenticated users?"

Without Context: The assistant relies on training data. It might generate a middleware.ts file using the old NextResponse.redirect() pattern with the wrong import path, or reference a configuration option that was renamed two versions ago.

With Context: The assistant queries your indexed Next.js 16 docs, finds the current middleware documentation, and generates code that matches the exact API of the version you're using. The correct imports, the current configuration format, the right patterns.

The same applies to Tailwind. Ask about a utility class and the assistant pulls from your indexed v4 docs instead of guessing based on v3 training data.

This happens transparently — your assistant calls the resolve tool when it needs docs, gets results in under 10ms, and uses them to ground its response. No extra prompting needed.

Tips for power users

Pin exact versions

Always match the indexed version to what's in your package.json. If you're on Next.js 16.0.0, index that exact tag:

context add https://github.com/vercel/next.js --tag v16.0.0

When you upgrade, add the new version. Old .db files stay around so you can switch back if needed.

Index your own docs

Context works with any Git repo that has Markdown files — including yours:

context add ./docs --name my-project --pkg-version 1.0

Index your internal API docs, runbooks, or design system documentation. Your AI assistant gets grounded access to company knowledge, completely private, no cloud service involved.

Share `.db` files with your team

Each documentation package is a single, self-contained .db file. You can share them:

# Build and export to a specific location
context add https://github.com/your-org/design-system \
  --name design-system --pkg-version 3.1 --save ./packages/

# Teammates install the pre-built package instantly
context add ./packages/design-system@3.1.db

Commit .db files to your repo, upload them to S3, or put them on a shared drive. No build step on the receiving end — the pre-indexed database installs instantly.

Update when a new version releases

When a library you depend on releases a new version:

context add https://github.com/vercel/next.js --tag v16.1.0

The old version's .db file stays intact. You can keep multiple versions indexed simultaneously.

What's happening under the hood

If you're curious about the internals: Context uses SQLite with FTS5 (full-text search) and BM25 ranking. When your AI assistant queries for "middleware authentication," the search engine:

Tokenizes the query using Porter stemming — so "authenticating" matches "authentication"
Runs FTS5 search across all indexed chunks
Ranks results with BM25 — section titles weighted 10x, doc titles 5x over body content
Filters low-relevance results — anything below 50% of the top score gets dropped
Merges adjacent chunks — so your assistant sees coherent documentation sections, not fragments
Caps at a token budget — keeping the response focused without flooding the context window

Total latency: under 10ms. Compare that to 100-500ms for a cloud round-trip.

If you also need live data access beyond static documentation — product catalogs, pricing, inventory — check out @neuledge/graph, which provides a semantic data layer for AI agents with pre-cached, sub-100ms responses.

Get started now

Install Context, index the docs for your current project, and connect to your editor. The whole setup is three commands:

npm install -g @neuledge/context
context add https://github.com/vercel/next.js
claude mcp add context -- npx @neuledge/context mcp

Your AI coding assistant just went from hallucinating outdated APIs to having instant, offline access to the exact documentation it needs.

GitHub repo — source code, issues, and full CLI reference
Documentation — quick start guide and MCP configuration
Integrations — setup guides for every supported editor
Compare alternatives — see how Context stacks up against cloud services

Forem: Moshe Simantov

Local SQLite Beats Cloud Docs for AI Coding. Our v1 Ships Today.

The bet, in one paragraph

How we got here — the 0.x story

Why this matters vs the cloud alternatives

What v1 actually means

What it looks like

What's next

Try v1

MCP vs Function Calling: A Practical Decision Framework

Function calling: tight integration, zero overhead

MCP: standardized, reusable, cross-client

The decision framework

Using both together

Start with the simpler thing

llms.txt Is Just a Table of Contents. Most AI Tools Stop There.

What llms.txt actually is (and isn't)

The "half the answer" problem

context add <website> — the new path

Following the links (why the index isn't enough)

A real example, end to end

Where this fits in the bigger picture

Try it

Your AI Coding Assistant Isn't Stupid — It's Starving for Context

The model upgrade treadmill

What the research actually says

The three context failures

Fixing context at the source

Before and after

Stop upgrading models. Start upgrading context.

Your AI Coding Assistant Can Finally Read Django and Spring Boot Docs

Three formats, zero configuration

Python ecosystem: Django, FastAPI, Flask

Java ecosystem: Spring Boot

Build from any git repo

Why this matters

RAG vs. Fine-Tuning vs. Grounding: Which One Does Your AI Actually Need?

The misdiagnosis that costs you weeks

Diagnose first, then pick your tool

Why fine-tuning is almost never the right first step

What good grounding actually looks like

When you actually need fine-tuning

The bottom line

116 Pre-Built Documentation Packages for Your AI Coding Assistant

The problem was simple repetition

What's actually in a registry package

How to use it

How the registry works

What's covered

Try it out

How @neuledge/graph Gives AI Agents Access to Live Data

Why RAG alone isn't enough for live data

The Graph approach — one tool, all your data

Setting up Graph

Querying data

Connecting Graph to your AI agent

Vercel AI SDK

OpenAI Agents SDK

LangChain

What the agent experience looks like

Building a custom data server

Graph + Context — the full grounding stack

Get started

What Is LLM Grounding? A Developer's Guide

What is LLM grounding?

Types of grounding by data source

The grounding architecture

Grounding vs. fine-tuning

Grounding in practice with MCP

Getting started

Local-First Documentation: What It Is and Why Your AI Agent Needs It

What is local-first documentation?

The problem with cloud documentation services

Why local-first is a better fit for AI agents

How local-first documentation works

Local-first vs. cloud documentation: when to use each

Get started

Version-Specific Documentation: Why Your AI Coding Assistant Gets It Wrong

It's not the model. It's the data.

The AI SDK is a perfect case study

`context add <website>` — the new path

Share `.db` files with your team