Forem: Beever AI

Day 1: Your Team’s Chat Is a Wiki Waiting to Happen — A New Kind of RAG

Beever AI — Tue, 28 Apr 2026 08:01:32 +0000

Why we built Beever Atlas — and why “distill first, retrieve second” works where vanilla RAG falls apart.
By Alan Yang

Your team already documents everything — in chat. Beever Atlas distills those conversations into a wiki that the LLM can actually reason over. Across Slack, Discord, Microsoft Teams, and Mattermost (More platforms will be integrated).

5-Day Beever Atlas Series — start here.

This is Day 1 of a five-part deep dive into Beever Atlas — the open-source, wiki-first RAG system that turns your team’s chat into a self-writing knowledge base.

Day 1: Why Beever Atlas exists — concept, comparison with other tools, real use cases

If any of this lands for you, the single best thing you can do is ⭐ the GitHub repo — it makes a real difference for an OSS launch.

The Karpathy Insight
Andrej Karpathy put it plainly in a post on X:

LLMs are “far better at reasoning over curated, encyclopedic content (books, docs, wikis) than over raw conversational transcripts.”

Chat history is noisy, redundant, temporally scattered, and packed with implicit context that only the people who were there can resolve. A wiki is the already-distilled form of that knowledge — deduplicated, structured, citation-bearing, organized by topic rather than by timestamp.

Most RAG systems retrieve from the messy left-hand side. We distill into the right-hand side first. That’s the entire idea.

Feed an LLM a raw Slack export and ask it to reason about your auth strategy. Then feed it a two-page wiki summarizing the same conversation. The second version produces better answers, fewer hallucinations, and traceable citations. Every time. This isn’t a subtle distinction — it’s a structural one.

The quality of LLM reasoning is bounded by the quality of its input.

What Beever Atlas Does

The wiki-preview above is real output. That page — concept map, topic clusters, FAQ section, glossary, decision log — was produced entirely from 246 Slack messages. Nobody scheduled a documentation sprint. Nobody opened Confluence.

Beever Atlas connects to your team’s chat platforms and continuously distills conversations into a structured, queryable wiki. Three things it does:

Reads from Slack, Discord, Microsoft Teams, Telegram, Mattermost — all five, on a schedule or on demand (README.md:31)
Continuously distills into a structured wiki with seven page types per channel: overview, topics, decisions, people, FAQ, glossary, activity (docs/mcp-server.md:139–149)
Answers questions with citations — through the dashboard or via MCP directly into Claude Code and Cursor

One limitation worth naming up front: very recent messages (last few minutes) may not yet appear in the wiki. The QA agent falls back to raw messages for facts that haven’t been distilled yet. You trade a small recency window for dramatically better reasoning quality on the bulk of your history.

How It Compares
The tools below all touch “AI + knowledge.” None of them do the same thing.

The honest version: Mem0 is good for personal memory. Notion AI is good if your team already writes the docs themselves. Obsidian with an LLM plugin is good for individual researchers who curate their own notes. Beever Atlas is good if your team’s knowledge already lives in chat and you want it to organize itself — without anyone touching a wiki editor.

What Teams Actually Do With This
These aren’t hypothetical. They’re four questions that surfaced in our own team chat the week we started building Beever Atlas.

1 The new hire who doesn’t DM Alice

It’s day three at a new company. Instead of pinging Alice on Slack (“hey, why are we using Stripe and not Adyen?”), The new hire opens the #payments wiki and clicks Decisions. The Stripe-vs-Adyen entry is right there — extracted from the original thread, with citations linking back to the source messages. No DM. No interruption. Onboarding scales without putting anyone on call.

2 The decision archaeologist

Six months after the JWT-vs-OAuth debate, you ask: ”Why did we pick JWT?” The QA agent traces the decision through the entity graph and returns a chronological timeline — who proposed each option, what was rejected and why, when the final call landed, and a link back to the original Slack thread. It’s the institutional memory your team’s chat already had, now searchable.

3 The expert finder

You ask: ” Does anyone know about database indexing?” Most teams answer this with @channel and hope. Atlas ranks teammates by message frequency on the topic, citation count from other facts, and graph weight — quietly, without putting anyone on the spot. The person who’s been answering indexing questions for three years bubbles to the top.

4 The Claude Code power-up

Inside Cursor (or Claude Code), your AI assistant calls ask_channel() over MCP. The response comes back with citations from your team’s actual decisions — not from training data, not made up. The same agent that powers the dashboard is now sitting next to your code editor, reading from the same brain.

Try It
Apache-2.0. Two free keys — Google AI Studio for Gemini, Jina for embeddings. Docker Compose. Runs on a laptop. About 15 minutes including Docker pulls.

Please Star Us on GitHub ⭐⭐⭐ (https://github.com/Beever-AI/beever-atlas)

One Line Installation:

git clone && ./atlas — running in a few minutes.

⭐ Star us on GitHub · 💬 Join the Discord · 🐦 Follow on X

Next — Day 2: Why Beever Atlas Uses Two Databases — and the 6-Stage Pipeline That Feeds Them.

LLM Wiki vs RAG: a different approach to team-chat memory

Beever AI — Fri, 24 Apr 2026 18:12:18 +0000

Retrieval-augmented generation became the reflex answer for "my LLM needs to know about my data." Chunk it, embed it, retrieve top-k on query, stuff it in the prompt. It works so well on documents that engineers now apply it to everything — PDFs, wikis, code, support tickets, and, increasingly, team chat.

RAG is the wrong tool for chat.

At Beever AI we spent the last months building an alternative and just released it as open source. Beever Atlas is an LLM Wiki — not a RAG system — that ingests Slack, Discord, Microsoft Teams, Mattermost, and Telegram conversations into a structured, browsable, auto-maintained wiki and knowledge graph. This post explains what an LLM Wiki is, what it can answer that RAG cannot, and the design decisions behind our implementation.

What is an LLM Wiki?

An LLM Wiki is a structured, LLM-maintained knowledge artefact derived from a conversational corpus. It is not a retrieval preprocessing step. It is the artefact itself — browsable by humans, queryable by agents, versioned, and cited back to its sources.

Seven differences worth naming:

Primary output. RAG surfaces prompt-time retrieval results. An LLM Wiki produces a standing artefact (the wiki and the knowledge graph).
What breaks when retrieval breaks. RAG hallucinates. An LLM Wiki is still readable as-is.
Consumed by. RAG: LLMs, mostly. LLM Wiki: humans and LLMs both.
Freshness cost. RAG re-indexes on every write. An LLM Wiki re-consolidates affected topics only.
Dedup. RAG does it per-query, ranking-dependent. An LLM Wiki does it at extraction time, structurally.
Citations. RAG reconciles sources post-hoc and can drift. An LLM Wiki carries citations forward from extraction.
Multi-hop questions. RAG is poor at them. An LLM Wiki makes them first-class via the graph.

Karpathy outlined the idea in a widely-shared thread earlier this year. We took it seriously and built a production implementation.

Why RAG falls down on conversations

Chat isn't documents. Four specific failure modes:

Same fact, said dozens of times. Someone announces "we're moving to Postgres on March 15" in #engineering. Twelve people ack. Four people quote it in later threads. Three retrospectives cite it. The same fact now exists as 20+ near-duplicate chunks. Chunk-based retrieval surfaces one somewhat arbitrarily, or several near-duplicates that crowd out unrelated relevant context.

Time-ordered, not topic-grouped. In a document, related content sits adjacent. In chat, a feature decision, a bug report, and a lunch plan can be interleaved across five minutes.

Structure lives outside the text. Meaningful metadata is not in the message body: author, thread parent, reactions, platform, mentions, attachments. Embedding the body only throws away half the signal.

Answer instability. Ask the same question a week apart. Different chunks rank highest depending on recent message volume and minor embedding perturbations. You get subtly different answers — corrosive to user trust.

What an LLM Wiki can answer that RAG can't

This is the part of the design that most people miss when they first see the architecture. A good vector retriever can handle "what did we decide about onboarding?" just fine. The capabilities that separate an LLM Wiki from RAG show up when the question is relational, temporal, or multi-hop:

"Who decided we're moving to Postgres, and did anything supersede that decision?"
"Which projects does the billing service block, and who owns them?"
"Show me every decision Alan announced in Q1 that was later reversed."
"Which technologies are members of team A working with that are still flagged experimental?"
"What got decided in threads that mention the SOC2 audit?"

Each of these requires walking a relationship — decision supersession, team membership, thread mentions, status attributes — that chunk-based retrieval fundamentally can't express. In Beever Atlas the knowledge graph in Neo4j holds those relationships. The Cypher traversal used in the codebase for decision-chain queries looks like this:

MATCH (d:Entity {type: 'Decision'})
WHERE d.channel_id = $channel_id
   OR EXISTS {
        MATCH (d)-[:MENTIONED_IN]->(ev:Event)
        WHERE ev.channel_id = $channel_id
      }
OPTIONAL MATCH (person:Entity)-[:DECIDED]->(d)
OPTIONAL MATCH (d)-[:SUPERSEDES]->(old:Entity)
OPTIONAL MATCH (newer:Entity)-[:SUPERSEDES]->(d)
WITH d,
     collect(DISTINCT person.name) AS decided_by,
     collect(DISTINCT old.name)    AS supersedes,
     collect(DISTINCT newer.name)  AS superseded_by
RETURN d, decided_by, supersedes, superseded_by
LIMIT $limit

This is the kind of query a Q&A agent's router hands off to the graph path when a question's shape demands it.

The shape of a fact and an entity

Before showing the pipeline, the data shapes that hold everything together:

Atomic fact — a single self-contained claim with a source message, author, timestamp, and thread context:

{
  "text": "The team is migrating from MySQL to Postgres on March 15",
  "source_message_id": "slack:C08TX:1712500000001100",
  "author": "alan5543",
  "timestamp": "2026-03-01T14:22:00Z",
  "thread_parent": null,
  "entities": [
    {"name": "MySQL", "type": "Technology"},
    {"name": "Postgres", "type": "Technology"}
  ],
  "relationships": [
    {
      "source": "Team",
      "target": "Postgres",
      "type": "MIGRATING_TO",
      "confidence": 0.92,
      "valid_from": "2026-03-01T14:22:00Z",
      "valid_until": null,
      "source_message_id": "slack:C08TX:1712500000001100"
    }
  ]
}

Two details to flag:

Every relationship has temporal props — confidence, valid_from, valid_until, source_message_id. This is how we answer "as of when" questions and how later decisions supersede earlier ones without destroying history.
Entity scope — some entity types (Person, Technology, Project, Team) merge globally across channels. Others (Decision, Meeting, Artifact) merge only within a channel scope. Scope-aware MERGE prevents two different channels' "Q1 planning decisions" from collapsing into each other, while still letting "Alan" mean the same person everywhere.

String-level dedup is done with Jaro-Winkler similarity (via APOC) inside the scope; semantic dedup is done with embedding cosine on Entity.name_vector.

How the wiki gets built — six ADK stages

Ingestion runs as a pipeline of Google ADK LlmAgent steps, using Gemini 2.5 Flash for extraction. Six stages:

Preprocessor — normalises platform-specific message shapes into a single NormalizedMessage with author, timestamp, thread context, attachments, platform metadata.
Fact Extractor — pulls atomic facts with channel + author + timestamp attached. Bounded output via MAX_FACTS_PER_MESSAGE.
Entity Extractor — LLM-driven with a flexible type schema (Person / Decision / Project / Technology / Team / Meeting / Artifact / …). Type vocabulary isn't frozen; new types can be added without a migration.
Cross-batch Validator — dedupes entities and relationships across batch boundaries. This is where scope-aware MERGE and Jaro-Winkler similarity live.
Relationship Graph — materialises bidirectional relationships (DECIDED ↔ DECIDED_BY, BLOCKED_BY ↔ BLOCKS, and so on) with the temporal props above.
Persister — transactional-outbox write to Weaviate + Neo4j + MongoDB, with media attribution preserved. The outbox pattern means a partial failure doesn't leave the stores inconsistent.

Periodically, a separate consolidation agent clusters related facts and synthesises them into topic pages — the "wiki" users browse in the dashboard. Consolidation is idempotent and checkpointed: you re-run it on a subset without rebuilding from scratch.

Dual memory: why Weaviate and Neo4j

Facts live in two stores because no single store wins both query shapes:

Weaviate — 3-tier semantic memory (channel-level summaries, topic-level synopses, fact-level detail). Hybrid BM25 + vector retrieval. Good for "what did we decide about X?" — any question whose answer is a semantic lookup over fact text.
Neo4j — entities, decisions, episodic links, media attributions. Good for "who decided this and why?" — any question that requires walking a relationship.

A query router picks semantic, graph, or both per question. The router is a small LLM classifier, not a rule engine. Rule engines fail on novel phrasings; a classifier trades a few milliseconds of routing latency for resilience to queries we haven't seen before.

flowchart LR
    Q[Question] --> R[Query Router<br/>LLM classifier]
    R -->|semantic| W[Weaviate<br/>channel / topic / fact]
    R -->|graph| N[Neo4j<br/>entity + rel traversal]
    R -->|both| W
    R -->|both| N
    W --> M[Merge + dedup by fact_id]
    N --> M
    M --> A[Answer agent<br/>ADK SkillToolset]
    A --> S[SSE stream<br/>response + citations]

MongoDB holds the wiki page cache, ingestion state, and the transactional outbox. Redis holds sessions. Nothing interesting happens in either — they're operational plumbing.

The MCP surface: same memory, two audiences

The dashboard is one surface. The MCP (Model Context Protocol) server is the other. Sixteen tools are exposed to external agents like Claude Code, Cursor, or any MCP-capable client:

search_by_topic, search_channel_facts
get_decision_timeline, find_supersessions
graph_traverse, resolve_entity, find_mentions
list_channels, read_wiki_section
…and more for ingestion status, media lookup, and policy.

The meaningful shift here is that the team memory becomes reusable context, not a siloed app. An IDE-resident coding agent can ask "what did the team decide about the auth library?" with the same guarantees the dashboard gives a human — citations back to source messages, scope-aware resolution, permission-aware access (roadmap, below).

What this costs

Honest accounting: ingestion is slower and more expensive than RAG. You pay the LLM twice — once for extraction, once for consolidation. Using Gemini Flash keeps both bounded. Consolidation runs sparsely (once per topic cluster) and amortises across many messages.

The tradeoff is explicit: you pay more ingestion cost to get a standing artefact instead of ephemeral retrieval results. For team-chat corpora with heavy repetition, high conversational noise, and relational/temporal queries, an LLM Wiki reliably wins on answer quality, citation fidelity, browsability, and the types of question it can even entertain. For already-structured documents, classic RAG is still the right tool.

A concrete query path

Someone asks "who decided we're moving to Postgres, and has anything superseded that decision?":

The router classifies this as both — it needs fact retrieval and graph traversal, since there's a decider and a supersession chain.
Weaviate retrieves the top facts about Postgres migration from the fact tier.
Neo4j runs the decision-chain traversal shown earlier, returning decided_by, supersedes, and superseded_by collections.
Results merge, dedup by fact_id, and pass to the answer agent.
The agent streams back via SSE:

Alan announced the migration from MySQL to Postgres on March 1, 2026 [1], with cutover scheduled for March 15 [1]. The decision was ratified in the March 3 engineering sync [2] and has not been superseded since.

[1] alan5543 in #engineering, 2026-03-01 14:22
[2] alan5543 in #engineering-sync, 2026-03-03 10:00

The citations come straight from the fact records surfaced in steps 2 and 3. There's no separate "find the source" step that can drift from what the agent actually retrieved.

Where we sit, and what's defensible

Most team-memory products are vector-only — Glean, Notion AI, various OSS "LLM wiki" clones built on a single embedding store. Beever Atlas is the first OSS product we're aware of putting a real knowledge graph under team-chat memory.

The graph isn't cosmetic. It's load-bearing for a specific class of query — multi-hop, temporal, scope-aware — that vector retrieval fundamentally can't serve. It's also the substrate on which a permission spine can be built: mirroring Slack Enterprise Grid ACLs as graph-level access rules means permissions get enforced at query time, not papered over with app-tier filters. That's on our roadmap and we think it's the right shape for enterprise deployment.

The rest of the stack

The full system runs locally via docker compose:

Python backend — FastAPI + Google ADK agents
TypeScript bot bridge — Slack / Discord / Teams / Mattermost / Telegram webhooks with platform-specific signature verification
React dashboard — wiki view, interactive Cytoscape knowledge graph, streaming Q&A with live citations
Weaviate, Neo4j, MongoDB, Redis

Platform credentials (bot tokens) are encrypted at rest with AES-256-GCM. All data stays in databases you control. The app sends no telemetry anywhere — LLM calls go through API keys you configure in your own .env.

Try it

A make demo target brings up the full stack pre-loaded with a public Wikipedia corpus (Ada Lovelace + Python history, CC-BY-SA 3.0). Pre-computed fixtures ship in the repo, so seeding runs without any API keys. Asking questions via the Q&A agent needs a free-tier Gemini API key — Ollama support for fully-local inference is on the roadmap.

git clone https://github.com/Beever-AI/beever-atlas
cd beever-atlas
make demo

Then:

curl -X POST http://localhost:8000/api/channels/demo-wikipedia/ask \
  -H "Authorization: Bearer dev-key-change-me" \
  -H "Content-Type: application/json" \
  -d '{"question":"Who was Ada Lovelace?"}'

You'll see a streaming SSE response with six citations linking back to the source Wikipedia articles.

Links

Repository: github.com/Beever-AI/beever-atlas
Documentation: docs.beever.ai/atlas
Demo video: youtu.be/VJ81Uxyjxb0

Beever Atlas is developed by Beever AI Limited in Toronto, Ontario, and released under the Apache 2.0 license. The LLM Wiki concept was inspired by Andrej Karpathy's earlier-2026 thread on LLM Knowledge Bases; we took the idea seriously and built the implementation.

Contributions welcome. Issues especially — we want to hear the edge cases where a vector-only system would have worked and our graph overhead turned out not to pay for itself.