<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rahul Dass</title>
    <description>The latest articles on Forem by Rahul Dass (@rahul_dass_097db50591fdd7).</description>
    <link>https://forem.com/rahul_dass_097db50591fdd7</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2629247%2F6d0a09d1-7944-47dc-b342-c8456b8c4c67.jpeg</url>
      <title>Forem: Rahul Dass</title>
      <link>https://forem.com/rahul_dass_097db50591fdd7</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rahul_dass_097db50591fdd7"/>
    <language>en</language>
    <item>
      <title>Why I Stopped Using LLMs to Verify LLMs (And Built a Deterministic Protocol Instead)</title>
      <dc:creator>Rahul Dass</dc:creator>
      <pubDate>Sun, 11 Jan 2026 11:24:14 +0000</pubDate>
      <link>https://forem.com/rahul_dass_097db50591fdd7/why-i-stopped-using-llms-to-verify-llms-and-built-a-deterministic-protocol-instead-phm</link>
      <guid>https://forem.com/rahul_dass_097db50591fdd7/why-i-stopped-using-llms-to-verify-llms-and-built-a-deterministic-protocol-instead-phm</guid>
      <description>&lt;h2&gt;
  
  
  The "Silent Failure" in Production
&lt;/h2&gt;

&lt;p&gt;We talk a lot about hallucinations, but we rarely talk about how we catch them. The industry standard right now is "LLM-as-a-Judge"—asking LLM to verify if an LLM's answer is correct.&lt;/p&gt;

&lt;p&gt;When I was building a RAG pipeline for a critical use case, I realized a dangerous flaw in this approach: &lt;strong&gt;Probabilistic models cannot perform deterministic verification&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If an LLM makes a math error or writes unsafe SQL, it’s often because it fundamentally misunderstands the logic in that context. Asking the same model (or a similar one) to "&lt;em&gt;double-check&lt;/em&gt;" often leads to the same error, just with more confidence.&lt;/p&gt;

&lt;p&gt;I realized I couldn't ship "&lt;em&gt;vibes&lt;/em&gt;" to production. I needed proofs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Shift: From Probabilistic to Deterministic&lt;/strong&gt;&lt;br&gt;
I started asking a simple question: &lt;em&gt;Why are we using AI to check math, when we have Python?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We have tools that have been correct for 30 years:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SymPy&lt;/strong&gt; for Calculus/Math.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Z3 Theorem Prover&lt;/strong&gt; for Logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AST Parsers&lt;/strong&gt; for Code Security.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem wasn't the tools; it was the lack of a protocol to connect them to LLM outputs easily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Introducing QWED: An Infrastructure for Truth&lt;/strong&gt;&lt;br&gt;
I decided to build &lt;strong&gt;QWED&lt;/strong&gt; not as another "AI tool," but as a &lt;strong&gt;Verification Protocol&lt;/strong&gt;. It treats the LLM as an "Untrusted Translator"—it can translate natural language into code/logic, but it is never trusted to evaluate the result.&lt;/p&gt;

&lt;p&gt;Here is the difference in architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;❌ The Old Way (LLM-as-a-Judge):&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;User Query -&amp;gt; LLM Answer -&amp;gt; LLM Judge (Vibes based check) -&amp;gt; User Result: 80% Reliable, unpredictable latency.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;✅ The Zero-Trust Way (QWED):&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;User Query -&amp;gt; LLM -&amp;gt; &lt;strong&gt;Deterministic Engine (Math/Code/Logic)&lt;/strong&gt; -&amp;gt; Proof/Fail -&amp;gt; User Result: 100% Mathematically Proven.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is how the &lt;strong&gt;Math Engine&lt;/strong&gt; catches a subtle hallucination that usually slips past LLM judges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from qwed_sdk import QWEDClient

client = QWEDClient()

## Scenario: LLM claims sqrt(81) is 8 (Common token prediction error)
llm_output = "sqrt(81) == 8"

## QWED uses SymPy to evaluate the expression mathematically
result = client.verify_math(llm_output)

if not result["verified"]:
    print(f"Hallucination Blocked: {result['explanation']}")
    # Output: "sqrt(81) evaluates to 9, which is not equal to 8."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also built engines for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;SQL Security:&lt;/strong&gt; Using AST to detect injection patterns before execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logic Puzzles:&lt;/strong&gt; Using Z3 to solve boolean satisfiability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Integrity:&lt;/strong&gt; Using Pandas to verify tabular claims.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Why Open Source?&lt;/strong&gt;&lt;br&gt;
I believe "Verification" should be a standard infrastructure layer, like SSL for security. It shouldn't be a black-box API hidden behind a paywall.&lt;/p&gt;

&lt;p&gt;We released QWED under &lt;strong&gt;Apache 2.0&lt;/strong&gt;. You can audit the code, run it locally (air-gapped), or inspect the solvers yourself.&lt;/p&gt;

&lt;p&gt;If you are tired of debugging "vibes" and want to build a pipeline based on proofs, check out the repo. I’d love to hear your feedback on the architecture.&lt;/p&gt;

&lt;p&gt;🌟 &lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/QWED-AI/qwed-verification" rel="noopener noreferrer"&gt;QWED-AI/qwed-verification&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
