<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: TyKolt</title>
    <description>The latest articles on Forem by TyKolt (@tykolt).</description>
    <link>https://forem.com/tykolt</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3850481%2F933a6483-3ec3-43a8-9f38-bb777973d311.jpeg</url>
      <title>Forem: TyKolt</title>
      <link>https://forem.com/tykolt</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/tykolt"/>
    <language>en</language>
    <item>
      <title>I spent months trying to stop LLM hallucinations. Prompt engineering wasn't enough. So I wrote a graph engine in Rust.</title>
      <dc:creator>TyKolt</dc:creator>
      <pubDate>Mon, 30 Mar 2026 14:12:54 +0000</pubDate>
      <link>https://forem.com/tykolt/i-spent-months-trying-to-stop-llm-hallucinations-prompt-engineering-wasnt-enough-so-i-wrote-a-4872</link>
      <guid>https://forem.com/tykolt/i-spent-months-trying-to-stop-llm-hallucinations-prompt-engineering-wasnt-enough-so-i-wrote-a-4872</guid>
      <description>&lt;p&gt;I started this project after reading about &lt;a href="https://singularitynet.io/airis-ventures-into-minecraft/" rel="noopener noreferrer"&gt;AIRIS&lt;/a&gt;, a cognitive agent from SingularityNET that learns by interacting with a Minecraft world. Not because I cared about Minecraft — but because of the principle: an AI that learns by doing, in a way you can actually observe and trace.&lt;/p&gt;

&lt;p&gt;That got me thinking. If an agent can learn from a simulated physical environment, could you do something similar in text? Could you build a system that builds knowledge through direct interaction with users, step by step, and where every piece of that knowledge is inspectable?&lt;/p&gt;

&lt;p&gt;I tried. And I failed. Several times.&lt;/p&gt;

&lt;h2&gt;
  
  
  The purity trap
&lt;/h2&gt;

&lt;p&gt;My first attempt was absurdly ambitious. I wanted to build everything from scratch — zero external libraries, zero implicit behavior, zero randomness. Every component had to be fully deterministic and transparent. No shortcuts.&lt;/p&gt;

&lt;p&gt;It sounds principled. In practice, it was a dead end. I couldn't use &lt;em&gt;any&lt;/em&gt; library that had opaque internals or non-deterministic behavior, which meant rewriting basic infrastructure from nothing. The project got slow, fragile, and impossible to maintain. Conceptual purity was killing the actual product.&lt;/p&gt;

&lt;p&gt;So I stepped back and asked a different question: the problem isn't external code — it's &lt;em&gt;what kind&lt;/em&gt; of external code. I started allowing dependencies again, but only ones that are deterministic, have no implicit intelligence, and behave predictably. That was the first real turning point.&lt;/p&gt;

&lt;h2&gt;
  
  
  The second problem: architecture
&lt;/h2&gt;

&lt;p&gt;Even after loosening the dependency rules, the project kept growing in the wrong direction. Too many components, unclear responsibilities, a fragmented codebase that was getting harder to reason about with every commit.&lt;/p&gt;

&lt;p&gt;At some point I realized I was building the wrong thing. I was trying to make Kremis &lt;em&gt;generate&lt;/em&gt; answers. But the actual problem was never generation — LLMs are already good at that. The problem was &lt;strong&gt;verification&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That's when the architecture flipped. Kremis became a sidecar: it doesn't produce responses, it validates them. It sits next to an LLM and checks whether what the model says is actually grounded in real data. The separation is strict — probabilistic inference on one side, deterministic logic on the other.&lt;/p&gt;

&lt;p&gt;That restructuring is what made everything click.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Kremis actually does
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/TyKolt/kremis" rel="noopener noreferrer"&gt;Kremis&lt;/a&gt; is a graph store written in Rust. You feed it structured data — entity, attribute, value triples — and it builds a deterministic graph. When you query it, you get back exactly what's in the graph. Nothing invented, nothing inferred.&lt;/p&gt;

&lt;p&gt;The core engine has no randomness, no floating-point arithmetic, no pre-loaded knowledge. Same input, same output, every time. That constraint is what makes everything else trustworthy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick example
&lt;/h3&gt;

&lt;p&gt;Say Kremis is running locally. You ingest some facts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/signals &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
       "signals": [
         {"entity_id": 1, "attribute": "name", "value": "Alice"},
         {"entity_id": 1, "attribute": "role", "value": "engineer"},
         {"entity_id": 1, "attribute": "works_on", "value": "Kremis"},
         {"entity_id": 1, "attribute": "knows", "value": "Bob"},
         {"entity_id": 2, "attribute": "name", "value": "Bob"},
         {"entity_id": 2, "attribute": "role", "value": "designer"},
         {"entity_id": 2, "attribute": "works_on", "value": "Kremis"},
         {"entity_id": 3, "attribute": "name", "value": "Kremis"},
         {"entity_id": 3, "attribute": "type", "value": "project"}
       ]
     }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now an LLM generates six claims about Alice. Kremis checks each one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[FACT]          Alice is an engineer.
[FACT]          Alice works on the Kremis project.
[FACT]          Alice knows Bob.
[NOT IN GRAPH]  Alice holds a PhD in machine learning from MIT.
[NOT IN GRAPH]  Alice previously worked at DeepMind as a research lead.
[NOT IN GRAPH]  Alice manages a cross-functional team of 8 people.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three grounded. Three fabricated. No "87% confidence" — just a binary answer.&lt;/p&gt;

&lt;p&gt;Validation works by looking up the entity node, fetching its properties, and comparing against the claims. The repo includes a &lt;a href="https://github.com/TyKolt/kremis/blob/main/examples/demo_honesty.py" rel="noopener noreferrer"&gt;demo script&lt;/a&gt; that runs this whole flow — Python, standard library only. Pass &lt;code&gt;--ollama&lt;/code&gt; to use a local model instead of mock claims.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not just a SQL table?
&lt;/h2&gt;

&lt;p&gt;I considered it. But I didn't want to write a new query for every possible claim an LLM might generate. A graph gives you relationship traversal without that overhead.&lt;/p&gt;

&lt;p&gt;That matters when the question isn't "what's Alice's role?" but "does Alice know someone who works on project X?" or "what connects these two entities?" Those are graph questions.&lt;/p&gt;

&lt;p&gt;The data model is EAV (Entity, Attribute, Value). Signals attach properties to entity nodes, ordered ingestion creates edges from co-occurrence. You get a connected structure you can query for properties, traversals, paths, intersections, and related context.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP integration
&lt;/h2&gt;

&lt;p&gt;Kremis ships with an &lt;a href="https://kremis.mintlify.app/mcp/overview" rel="noopener noreferrer"&gt;MCP server&lt;/a&gt;. If you use Claude Desktop, Cursor, or anything that speaks Model Context Protocol, you can point it at a running Kremis instance and the assistant queries the graph directly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"kremis"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/kremis-mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"KREMIS_URL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8080"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"KREMIS_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your-key-here"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No API auth? Omit &lt;code&gt;KREMIS_API_KEY&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The assistant gets 9 tools — ingest, lookup, traverse, path, intersect, status, properties, retract, hash. Instead of hallucinating about your data, it can just look it up.&lt;/p&gt;

&lt;h2&gt;
  
  
  What about RAG and vector DBs?
&lt;/h2&gt;

&lt;p&gt;I tried the usual stack before building this. System prompts, careful prompt engineering, vector databases. None of it solved the core issue: retrieval can be accurate and the model still invents details that aren't there.&lt;/p&gt;

&lt;p&gt;Vector DBs answer "find me documents similar to this query." That's useful for retrieval. But Kremis answers a different question: &lt;strong&gt;"is this specific fact in my data, yes or no?"&lt;/strong&gt; Those are two different problems, and I got tired of pretending they're the same one.&lt;/p&gt;

&lt;p&gt;Confidence scores didn't help either. An "87% confidence" doesn't tell me whether Alice has a PhD or not. I wanted a binary answer, and that's what Kremis gives.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this is not
&lt;/h2&gt;

&lt;p&gt;Kremis doesn't "understand" anything. The name means "cognitive substrate", but the system is much simpler than that sounds. It stores structure from signals it has processed. No intelligence. No reasoning. Just a graph.&lt;/p&gt;

&lt;p&gt;It's also alpha software — currently &lt;code&gt;v0.17.4&lt;/code&gt;. The API works, but I'm still making breaking changes before &lt;code&gt;v1.0&lt;/code&gt;. Pin your version.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;Three members in one Rust workspace:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;kremis-core&lt;/strong&gt; — pure library, no async, no network, no side effects. The graph engine. Every function is deterministic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;kremis&lt;/strong&gt; — CLI and HTTP API (axum). The binary that runs the server.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;kremis-mcp&lt;/strong&gt; — MCP bridge over stdio.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Persistence is either in-memory or via &lt;a href="https://github.com/cberner/redb" rel="noopener noreferrer"&gt;redb&lt;/a&gt; for ACID transactions and crash safety. There's also a Docker image. Apache 2.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/TyKolt/kremis.git
&lt;span class="nb"&gt;cd &lt;/span&gt;kremis
cargo build &lt;span class="nt"&gt;--release&lt;/span&gt;
cargo run &lt;span class="nt"&gt;-p&lt;/span&gt; kremis &lt;span class="nt"&gt;--&lt;/span&gt; init
cargo run &lt;span class="nt"&gt;-p&lt;/span&gt; kremis &lt;span class="nt"&gt;--&lt;/span&gt; ingest &lt;span class="nt"&gt;-f&lt;/span&gt; examples/sample_signals.json &lt;span class="nt"&gt;-t&lt;/span&gt; json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in another terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo run &lt;span class="nt"&gt;-p&lt;/span&gt; kremis &lt;span class="nt"&gt;--&lt;/span&gt; server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run the demo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python examples/demo_honesty.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, if you want to use a local model through Ollama:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python examples/demo_honesty.py &lt;span class="nt"&gt;--ollama&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;The repo is at &lt;a href="https://github.com/TyKolt/kremis" rel="noopener noreferrer"&gt;github.com/TyKolt/kremis&lt;/a&gt;. Full docs at &lt;a href="https://kremis.mintlify.app" rel="noopener noreferrer"&gt;kremis.mintlify.app&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;RAG handles retrieval. Kremis handles verification. I spent months conflating the two before I realized they need separate tools.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Disclosure: An initial draft of this article was generated with AI assistance. The technical content, architecture decisions, project history, and opinions are entirely my own. All code examples are from the actual repository.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>llm</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
