<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: AgentOracle</title>
    <description>The latest articles on Forem by AgentOracle (@agentoracle).</description>
    <link>https://forem.com/agentoracle</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3888239%2F8fe72ef7-b212-4836-8413-b2bcaa3c7241.png</url>
      <title>Forem: AgentOracle</title>
      <link>https://forem.com/agentoracle</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/agentoracle"/>
    <language>en</language>
    <item>
      <title>How to add claim verification to your AI content approval workflow</title>
      <dc:creator>AgentOracle</dc:creator>
      <pubDate>Mon, 11 May 2026 22:31:01 +0000</pubDate>
      <link>https://forem.com/agentoracle/how-to-add-claim-verification-to-your-ai-content-approval-workflow-3797</link>
      <guid>https://forem.com/agentoracle/how-to-add-claim-verification-to-your-ai-content-approval-workflow-3797</guid>
      <description>&lt;p&gt;&lt;strong&gt;The 90-second version&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One wrong claim slips through. It reaches the client's customers. Maybe it's a stat your AI invented. Maybe it's a competitor comparison that's almost right, but not quite. Maybe it's a regulatory line your team has said a hundred times the right way, and this once it came out the wrong way.&lt;/p&gt;

&lt;p&gt;Now you're explaining it to your client's legal team. Or worse, to a regulator. Or worse than that, to a journalist who screenshots and shares.&lt;/p&gt;

&lt;p&gt;This is the quiet fear in every content operations team using AI in 2026. And the current solution — a human fact-checker reviewing every piece — doesn't scale past a handful of campaigns per week. The fact-checker becomes the bottleneck. Errors slip through anyway, because nobody can review 200 pieces in an afternoon.&lt;/p&gt;

&lt;p&gt;There is a better answer, and it doesn't require rebuilding your stack. It's a verification step that sits between your AI draft and your final approval, and it returns a tamper-evident receipt your legal team can audit.&lt;/p&gt;

&lt;p&gt;That's AgentOracle.&lt;/p&gt;

&lt;p&gt;What it actually does in your workflow&lt;br&gt;
Picture your existing approval flow:&lt;/p&gt;

&lt;p&gt;AI draft → human review → legal sign-off → publish&lt;/p&gt;

&lt;p&gt;AgentOracle adds one step:&lt;/p&gt;

&lt;p&gt;AI draft → AgentOracle verification → human review → legal sign-off → publish&lt;/p&gt;

&lt;p&gt;The verification step takes any factual claim from the draft and runs it against four independent sources in parallel. It returns three things:&lt;/p&gt;

&lt;p&gt;A verdict — act, verify, reject, or abstain&lt;/p&gt;

&lt;p&gt;A confidence score — a precise number between 0 and 1&lt;/p&gt;

&lt;p&gt;A cryptographic receipt — a signed proof of what was checked, when, and against what sources&lt;/p&gt;

&lt;p&gt;Your team uses the verdict to make the publish/hold decision. Anything below your confidence threshold goes to a human. Anything above publishes with the receipt attached to the campaign record.&lt;/p&gt;

&lt;p&gt;That receipt is the part that changes everything for your legal team.&lt;/p&gt;

&lt;p&gt;What a "cryptographic receipt" means in plain English&lt;br&gt;
A receipt is a small block of text that looks like random characters. It is signed with a key only AgentOracle holds, and a public key anyone can use to verify the signature.&lt;/p&gt;

&lt;p&gt;If anyone — your legal team, a client's auditor, a regulator, a journalist — wants to confirm that you actually verified a claim before publishing, they take the receipt, fetch our public key, and run a single verification. If the receipt matches, it's authentic. If a single character was altered, the verification fails closed. There is no ambiguity, no "trust the vendor" dependency, no missing log files.&lt;/p&gt;

&lt;p&gt;You don't need to understand the cryptography to benefit from it. You just need to know that:&lt;/p&gt;

&lt;p&gt;Receipts are tamper-evident&lt;/p&gt;

&lt;p&gt;Receipts are third-party verifiable without trusting AgentOracle&lt;/p&gt;

&lt;p&gt;Receipts are portable — they work even if AgentOracle disappears tomorrow&lt;/p&gt;

&lt;p&gt;This is what your compliance officer has been quietly wishing for since AI content tooling went mainstream. It's the audit trail that holds up when someone asks "prove you checked this."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it replaces&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of a human fact-check team costing $50K–200K per FTE per year: &lt;strong&gt;Claim verification at $0.02–$0.10 per claim, returned in seconds.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of screenshot evidence in a Google Doc:&lt;br&gt;
&lt;strong&gt;A cryptographic receipt anyone can verify independently.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of email chains saying "I checked it on [site]":&lt;br&gt;
&lt;strong&gt;A signed JWS with structured source data attached to every claim.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of manual EU AI Act Article 26 record-keeping:&lt;br&gt;
&lt;strong&gt;Automatic, tamper-evident, replayable audit trail built in.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of trusting the vendor's claims about their checking:&lt;br&gt;
&lt;strong&gt;Verify the receipt yourself. No trust required.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You're not adding another tool to your stack. You're replacing two or three.&lt;/p&gt;

&lt;p&gt;Why this is not fact-checking software&lt;br&gt;
If you've evaluated tools like Originality.ai, Logically, or NewsGuard — AgentOracle is a different category. Those tools answer different questions:&lt;/p&gt;

&lt;p&gt;Originality.ai scores whether content looks AI-generated and runs basic plagiarism checks. Useful for detection. Doesn't verify whether specific claims are true.&lt;/p&gt;

&lt;p&gt;Logically runs human-powered misinformation review for governments and brands. Slow turnaround. No cryptographic proof.&lt;/p&gt;

&lt;p&gt;NewsGuard rates the credibility of sources (this domain is reliable, that one isn't). Doesn't tell you anything about a specific claim inside a piece.&lt;/p&gt;

&lt;p&gt;None of them return a tamper-evident receipt your legal team can hand to a regulator and say "here is the proof we verified this claim before we published it." That's the gap AgentOracle fills.&lt;/p&gt;

&lt;p&gt;We're a different layer. You can run all of them together if you want.&lt;/p&gt;

&lt;p&gt;What others have said&lt;br&gt;
A contributor to Mastercard's Verifiable Intent RFC independently verified our receipt format end-to-end last month. Tested both the Node and Python verifiers. Tamper test failed closed. His exact quote: "Strong work. The calibration.provisional field is the right discipline."&lt;/p&gt;

&lt;p&gt;This week, a Coinbase engineer publicly engaged on our x402 implementation on the canonical x402 issue thread (issue #2207 on x402-foundation/x402, May 7, 2026), diagnosed it, and tagged us directly.&lt;/p&gt;

&lt;p&gt;These are the kinds of independent technical signals that don't typically come from vendor marketing departments. They come from people stress-testing the implementation against the spec.&lt;/p&gt;

&lt;p&gt;This week, AgentOracle was indexed in Coinbase Bazaar discovery. You can verify this yourself with one curl:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;curl 'https://api.cdp.coinbase.com/platform/v2/x402/discovery/merchant?payTo=0xdF90200B0031051BbF7a66BB9387d2Ecf599e109'&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That returns our resource manifest, schema, example output, and 30-day usage stats — served by Coinbase, not us. If you'd rather see raw on-chain proof, our most recent settlement on Base mainnet: &lt;a href="https://basescan.org/tx/0x01e37297fd96b9ab0476d1f4d1b2b925db9de564458fd52cf9ad9cf092b79cd5" rel="noopener noreferrer"&gt;&lt;code&gt;0x01e37297…2b79cd5&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Pilot Offer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you have 3 to 5 representative claims your team is about to publish — send them to &lt;a href="mailto:joe@agentoracle.co"&gt;joe@agentoracle.co&lt;/a&gt;. We'll come back same-day with real receipts run against YOUR content, plus a pilot scope sized to your volume. No commitment, no procurement step, no scheduling a call. You see the product working on your stuff before anything else happens.&lt;/p&gt;

&lt;p&gt;If those receipts justify continuing, here's the pilot:&lt;/p&gt;

&lt;p&gt;Thirty days. $2,500. We do the integration work.&lt;/p&gt;

&lt;p&gt;Specifically:&lt;/p&gt;

&lt;p&gt;Up to 50,000 claim verifications during the pilot&lt;/p&gt;

&lt;p&gt;Custom dashboard with audit log export&lt;/p&gt;

&lt;p&gt;Async Slack or email support&lt;/p&gt;

&lt;p&gt;One integration call to plug AgentOracle into your existing approval workflow (we do the technical work; your team does not need an engineer)&lt;/p&gt;

&lt;p&gt;A 30-day evaluation report from us at the end summarizing what we caught, what we missed, and what your team should do next&lt;/p&gt;

&lt;p&gt;Money-back if you tell us by day 7 the receipts aren't usable. Keeps both sides honest. Almost never gets requested, but takes the procurement risk to zero.&lt;/p&gt;

&lt;p&gt;If after 30 days your team thinks the receipts justify continuing, we move to a monthly tier sized to your volume. If not, you keep every receipt you generated and you owe nothing more. We do not pull data we don't need to. Your content stays your content.&lt;/p&gt;

&lt;p&gt;No annual contracts. No procurement gymnastics. No per-seat counting. Just a signed audit trail your legal team has been asking for.&lt;/p&gt;

&lt;p&gt;Receipt spec public at github.com/TKCollective/agentoracle-receipt-spec. Public JWKS at agentoracle.co/.well-known/jwks.json. Independently reproducible AVeriTeC + FEVER benchmark shipping May 14, 2026._&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>productivity</category>
      <category>career</category>
    </item>
    <item>
      <title>Stop Your RAG Pipeline From Hallucinating: A 15-Line Fix published</title>
      <dc:creator>AgentOracle</dc:creator>
      <pubDate>Fri, 01 May 2026 21:57:03 +0000</pubDate>
      <link>https://forem.com/agentoracle/stop-your-rag-pipeline-from-hallucinating-a-15-line-fixpublished-46df</link>
      <guid>https://forem.com/agentoracle/stop-your-rag-pipeline-from-hallucinating-a-15-line-fixpublished-46df</guid>
      <description>&lt;p&gt;Your RAG pipeline retrieves real documents — and still hallucinates. Here's the retrieve → generate → verify pattern that catches it before your agent acts, with working Python code you can run right now.&lt;/p&gt;




&lt;p&gt;Your RAG pipeline retrieves three real documents. The LLM reads them. It generates a response that cites those exact sources. Everything looks clean.&lt;/p&gt;

&lt;p&gt;And it's still wrong about 8–15% of the time.&lt;/p&gt;

&lt;p&gt;If you've deployed RAG to production, you already know this. The answer looks grounded in the retrieved chunks, but a closer read reveals the model invented a date, swapped a name, overstated a number, or fused two unrelated facts into a single plausible-sounding sentence. The citations point to real documents. The statement the citations supposedly support was not actually in those documents.&lt;/p&gt;

&lt;p&gt;This is the hardest class of hallucination to catch. It doesn't look like a hallucination. It looks like a correct answer.&lt;/p&gt;

&lt;p&gt;This tutorial shows you how to add a verification step to your RAG pipeline in about 15 lines of Python. The verifier runs independently of your retrieval stack and your generation model. It reads the final output, extracts individual claims, checks each one across four independent sources, and returns a verdict before your agent acts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why RAG Hallucinations Are Different
&lt;/h2&gt;

&lt;p&gt;Classic LLM hallucination: the model is asked a question it doesn't know the answer to, so it invents one.&lt;/p&gt;

&lt;p&gt;RAG hallucination: the model has correct context in its window, and still produces a statement that isn't supported by that context. The three failure modes I see most in production:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fabrication under citation&lt;/strong&gt;. The response cites source [2], but the claim it attributes to source [2] isn't actually there. The citation exists; the grounding doesn't.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fact fusion&lt;/strong&gt;. Two unrelated facts from two different retrieved chunks get combined into a single sentence. Each half is correct. The combined sentence is false.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confident extrapolation&lt;/strong&gt;. The model extrapolates from what the documents say to a related claim the documents don't support, and delivers it with the same confidence as the verified parts.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All three survive retrieval-quality metrics. They survive BLEU, ROUGE, and BERTScore. They survive your "faithfulness" eval if it runs off the same LLM that generated the answer.&lt;/p&gt;

&lt;p&gt;The only reliable catch is a second, independent verification pass — different model, different evidence source, different prompt — that reads the final output and scores each claim against the open web.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Retrieve → Generate → Verify Pattern
&lt;/h2&gt;

&lt;p&gt;Standard RAG is two stages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query → retrieve → generate → return
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add one stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query → retrieve → generate → verify → return
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The verify stage decomposes the generated response into individual atomic claims, checks each one, and returns a per-claim verdict plus an overall &lt;code&gt;act&lt;/code&gt; / &lt;code&gt;verify&lt;/code&gt; / &lt;code&gt;reject&lt;/code&gt; recommendation. Your application decides what to do with a &lt;code&gt;reject&lt;/code&gt;: surface the bad claims to a user, regenerate with tighter constraints, fall back to a safer response, or abort.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;For the simple programmatic case (the bulk of this tutorial), the only dependency is &lt;code&gt;requests&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For full LangChain tool integration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain-agentoracle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No API keys. No configuration. The free &lt;code&gt;/preview&lt;/code&gt; endpoint gives you 10 verifications per hour to test with; the production &lt;code&gt;/evaluate&lt;/code&gt; endpoint is $0.01 per call via x402 on Base.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Minimal RAG Pipeline That Hallucinates
&lt;/h2&gt;

&lt;p&gt;First, let's build a RAG pipeline that's deliberately vulnerable. We'll use a tiny in-memory corpus about OpenAI so the hallucinations are easy to spot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.schema&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SystemMessage&lt;/span&gt;

&lt;span class="c1"&gt;# Three real documents — our "retrieved context"
&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;OpenAI was founded in December 2015 as a non-profit
    research organization. Co-founders included Sam Altman, Elon Musk,
    Ilya Sutskever, and Greg Brockman, among others.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;ChatGPT was released by OpenAI on November 30, 2022.
    It reached 100 million monthly active users by January 2023, making
    it the fastest-growing consumer application in history at the time.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;OpenAI has received major investments from Microsoft,
    including a multi-year, multi-billion dollar commitment announced
    in January 2023.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Toy retriever — in production, use your vector DB
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer from this context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Who founded OpenAI, when was ChatGPT released, and how fast did it grow?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenAI founding and ChatGPT growth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run this a few times. On some runs you'll get a clean answer. On others you'll get a response that invents a co-founder not in the documents, or claims ChatGPT reached one billion users in two months, or attributes the wrong investment figure to Microsoft. Same retrieval, same prompt — different hallucination profile per run.&lt;/p&gt;

&lt;p&gt;This is the exact scenario the verification layer is built for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Add Verification in 15 Lines
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://agentoracle.co/evaluate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;evaluation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_generate_verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;draft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;verdict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;refuted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claim&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claims&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refuted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;unverifiable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claim&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claims&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unverifiable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;refuted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refuted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claims&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;refuted&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommendation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reject&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;unverifiable&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unverifiable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claims&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;unverifiable&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;overall_confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve_generate_verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Who founded OpenAI and how fast did ChatGPT grow?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole integration. Before your agent acts on &lt;code&gt;draft&lt;/code&gt;, &lt;code&gt;verify(draft)&lt;/code&gt; extracts the atomic claims, checks each across four independent verification sources, and returns a structured verdict.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;LangChain users:&lt;/strong&gt; if you want the verifier as a tool callable from an agent loop instead of a function call, use &lt;code&gt;from langchain_agentoracle import AgentOracleEvaluateTool&lt;/code&gt; — it returns formatted text suitable for LLM consumption. The plain HTTP call above is what you want when you need the JSON for application logic (gating, branching, repair).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What a Real Verification Run Looks Like
&lt;/h2&gt;

&lt;p&gt;Here's actual output from feeding AgentOracle a deliberately-hallucinated RAG response. The input text was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"OpenAI was founded in 2015 by Sam Altman, Elon Musk, and Mark Zuckerberg. The company released ChatGPT in 2022, which reached 1 billion users within 2 months."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Four of those facts are true. Two are hallucinated: Mark Zuckerberg was never an OpenAI co-founder, and ChatGPT reached 100 million users in two months, not one billion.&lt;/p&gt;

&lt;p&gt;The verifier response (trimmed for readability):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"recommendation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"reject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"overall_confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.47&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"total_claims"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"verified_claims"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"refuted_claims"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"claims"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"claim"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OpenAI was founded in 2015"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"verdict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"supported"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.83&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"claim"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OpenAI was founded by Sam Altman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"verdict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"supported"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"claim"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OpenAI was founded by Elon Musk"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"verdict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"supported"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"claim"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OpenAI was founded by Mark Zuckerberg"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"verdict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"refuted"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"evidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"No search results mention Mark Zuckerberg as a founder; founders listed include Sam Altman, Elon Musk, Ilya Sutskever, Greg Brockman."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"claim"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OpenAI released ChatGPT in 2022"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"verdict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"supported"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"claim"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ChatGPT reached 1 billion users within 2 months"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"verdict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"refuted"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.48&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"evidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ChatGPT reached 100 million users in 2 months (Jan 2023), not 1 billion. 1 billion milestone was later."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two hallucinations caught. Four true claims confirmed. One &lt;code&gt;reject&lt;/code&gt; recommendation that short-circuits the downstream agent action.&lt;/p&gt;

&lt;p&gt;Notice what the verifier does &lt;em&gt;not&lt;/em&gt; do: it doesn't grade the answer against the retrieved documents. RAG-specific evals that do that miss fabrication-under-citation and fact-fusion every time. Instead, the verifier treats the generated claim as a free-standing statement and checks it against the open web through four independent sources. The retrieved documents are only as good as the next step of your pipeline, and the next step is the LLM — which already had them and still hallucinated.&lt;/p&gt;

&lt;h2&gt;
  
  
  When To Use Each Recommendation
&lt;/h2&gt;

&lt;p&gt;The verifier returns one of three top-level recommendations, plus per-claim verdicts from a richer 4-way space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Top-level recommendation:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;th&gt;Rough confidence band&lt;/th&gt;
&lt;th&gt;What your agent should do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;act&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;≥ 0.80&lt;/td&gt;
&lt;td&gt;Proceed. Claims are well-supported across sources.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;verify&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.50 – 0.80&lt;/td&gt;
&lt;td&gt;Soft-pass. Log the claims that dragged confidence down. Consider human-in-the-loop for high-stakes actions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;reject&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&amp;lt; 0.50, OR any refuted claim&lt;/td&gt;
&lt;td&gt;Do not act on the response as-is.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Per-claim verdicts:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Recommended action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;supported&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Multiple sources confirm the claim.&lt;/td&gt;
&lt;td&gt;Trust.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;refuted&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Evidence directly contradicts the claim.&lt;/td&gt;
&lt;td&gt;Always block — this is a hallucination.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;unverifiable&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Couldn't find supporting or contradicting evidence.&lt;/td&gt;
&lt;td&gt;Treat as soft-flag, not hard fail. Often means the claim is too specific, too recent, or too obscure for the open web. Not the same as "false."&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A common production mistake is treating &lt;code&gt;unverifiable&lt;/code&gt; the same as &lt;code&gt;refuted&lt;/code&gt;. Don't. A draft can get a &lt;code&gt;reject&lt;/code&gt; recommendation purely on low overall confidence from several &lt;code&gt;unverifiable&lt;/code&gt; claims even when nothing is actually wrong. Check &lt;code&gt;verdict["refuted_claims"]&lt;/code&gt; separately before deciding what to do — the code above does this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling The Three RAG Failure Modes
&lt;/h2&gt;

&lt;p&gt;The three failure modes from the start of this post — fabrication-under-citation, fact-fusion, confident-extrapolation — all get caught by the same pattern. Here's why:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fabrication under citation.&lt;/strong&gt; The verifier decomposes the response into atomic claims and checks each one against the open web. The cited source is irrelevant to the verifier; what matters is whether the claim itself is supported. If the response says "source [2] reports 47% revenue growth" and source [2] actually reports 4.7%, the 47% claim gets refuted independently of the citation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fact fusion.&lt;/strong&gt; Each atomic claim gets verified independently. If the response fuses "Apple's Q4 revenue was $120B" (true) with "announced on March 3" (true for a different product) into "Apple's $120B Q4 revenue was announced on March 3" (false), the fused claim gets checked as-is and refuted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Confident extrapolation.&lt;/strong&gt; The verifier doesn't care how confident the generation model sounded. It cares what the open web says. An extrapolation that looks authoritative in context but is unsupported by any independent source returns &lt;code&gt;unverifiable&lt;/code&gt; or &lt;code&gt;refuted&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Upgrading: Per-Claim Regeneration
&lt;/h2&gt;

&lt;p&gt;Once you have &lt;code&gt;verdict["claims"]&lt;/code&gt;, you can do more than reject the whole response. You can surgically regenerate only the failed claims:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify_and_repair&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;draft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;verdict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;refuted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claim&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claims&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refuted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;refuted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;draft&lt;/span&gt;

    &lt;span class="c1"&gt;# Re-generate with explicit "do not include" list
&lt;/span&gt;    &lt;span class="n"&gt;repair_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer the following using ONLY the retrieved context. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Do not include these claims that were refuted: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;refuted&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Original query: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;repaired&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;repair_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;repaired&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the pattern I see most in production RAG pipelines. Soft reject → named failure list → targeted regeneration. You get the speed benefits of auto-generation with the safety of verification, and the user never sees the hallucinated version.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Notes
&lt;/h2&gt;

&lt;p&gt;A few things I've learned from running this in real pipelines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency.&lt;/strong&gt; &lt;code&gt;/evaluate&lt;/code&gt; typically returns in 3–6 seconds for a short paragraph with 3–6 claims. If your RAG pipeline runs hot and that's too slow, add verification only to high-stakes agent actions (writes, transactions, external messages) — not to every chat turn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost.&lt;/strong&gt; The free tier (10/hour) is fine for development. For production, &lt;code&gt;/evaluate&lt;/code&gt; is pay-per-query over x402 on Base at &lt;a href="https://agentoracle.co" rel="noopener noreferrer"&gt;$0.01 per call&lt;/a&gt;. An agent making 100 verifications/hour costs ~$1/hour. Typically cheaper than the LLM call that generated the response you're verifying.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thresholds.&lt;/strong&gt; Default is 0.80 for &lt;code&gt;act&lt;/code&gt;. Bump to 0.90 for regulated workflows (medical, legal, financial) where a 10% false-positive on true claims is cheaper than a 1% false-negative on hallucinations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failure modes.&lt;/strong&gt; Sometimes &lt;code&gt;/evaluate&lt;/code&gt; returns &lt;code&gt;unverifiable&lt;/code&gt; instead of &lt;code&gt;supported&lt;/code&gt; / &lt;code&gt;refuted&lt;/code&gt;. That usually means the claim is too specific, too recent, or too obscure for the open web. Treat &lt;code&gt;unverifiable&lt;/code&gt; the same as &lt;code&gt;verify&lt;/code&gt; — soft-flag, don't hard-fail. The code in this tutorial separates refuted from unverifiable on purpose.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Full Minimal Example
&lt;/h2&gt;

&lt;p&gt;For easy copy-paste, here's the complete working example in one block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.schema&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SystemMessage&lt;/span&gt;

&lt;span class="n"&gt;corpus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenAI was founded in December 2015 as a non-profit research organization.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ChatGPT was released by OpenAI on November 30, 2022 and reached 100 million users by January 2023.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://agentoracle.co/evaluate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;evaluation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rag_with_verification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Retrieve
&lt;/span&gt;    &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;

    &lt;span class="c1"&gt;# Generate
&lt;/span&gt;    &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;draft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer only from this context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="c1"&gt;# Verify
&lt;/span&gt;    &lt;span class="n"&gt;verdict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;refuted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claim&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claims&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refuted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;refuted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REJECTED — hallucinated claims: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;refuted&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;draft&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;rag_with_verification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;When was OpenAI founded and how fast did ChatGPT grow?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it. Break it on purpose by loosening the temperature or narrowing the corpus. Watch what the verifier catches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Playground (no setup):&lt;/strong&gt; &lt;a href="https://agentoracle.co" rel="noopener noreferrer"&gt;agentoracle.co&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Packages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pip install langchain-agentoracle&lt;/code&gt; — &lt;a href="https://pypi.org/project/langchain-agentoracle/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pip install crewai-agentoracle&lt;/code&gt; — &lt;a href="https://pypi.org/project/crewai-agentoracle/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npx agentoracle-mcp&lt;/code&gt; — &lt;a href="https://www.npmjs.com/package/agentoracle-mcp" rel="noopener noreferrer"&gt;npm&lt;/a&gt; (Claude Desktop, Cursor, Windsurf)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/TKCollective/x402-research-skill" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verifiable receipts spec:&lt;/strong&gt; &lt;a href="https://github.com/TKCollective/agentoracle-receipt-spec" rel="noopener noreferrer"&gt;github.com/TKCollective/agentoracle-receipt-spec&lt;/a&gt; — every &lt;code&gt;/evaluate&lt;/code&gt; response commits to a JWS-signed receipt format you can verify offline against the public JWKS. See &lt;a href="https://github.com/TKCollective/agentoracle-receipt-spec/tree/main/examples" rel="noopener noreferrer"&gt;the /examples directory&lt;/a&gt; for verifying examples in Node and Python.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Earlier in this series:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/agentoracle/how-to-add-claim-verification-to-your-langchain-agent-in-5-minutes-13ai"&gt;How to Add Claim Verification to Your LangChain Agent in 5 Minutes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/agentoracle/3-agent-integration-patterns-for-claim-verification-langchain-crewai-mcp-2l3h"&gt;3 Agent Integration Patterns for Claim Verification (LangChain + CrewAI + MCP)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;RAG was supposed to solve hallucinations. It solved some — then introduced a harder class. The fix is the same fix it's always been: a verification step that runs on the output, independent of whatever pipeline produced it.&lt;/p&gt;

&lt;p&gt;Fifteen lines of Python. Free tier to try. The code above works as-is.&lt;/p&gt;

</description>
      <category>rag</category>
      <category>ai</category>
      <category>python</category>
      <category>llm</category>
    </item>
    <item>
      <title>184 MCP installs and a 93.9% adversarial signal GPT-4o can't replicate</title>
      <dc:creator>AgentOracle</dc:creator>
      <pubDate>Fri, 24 Apr 2026 14:29:29 +0000</pubDate>
      <link>https://forem.com/agentoracle/184-mcp-installs-and-a-939-adversarial-signal-gpt-4o-cant-replicate-1ch6</link>
      <guid>https://forem.com/agentoracle/184-mcp-installs-and-a-939-adversarial-signal-gpt-4o-cant-replicate-1ch6</guid>
      <description>&lt;p&gt;184 MCP installs in 72 hours after publishing &lt;code&gt;agentoracle-mcp&lt;/code&gt; — and more importantly, &lt;strong&gt;93.9% of adversarial-flagged refutations that GPT-4o alone could not catch.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This post is about the verification layer under that number: the benchmark methodology, the architecture that produces the adversarial signal, and why claim verification deserves its own layer in the agent stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmark
&lt;/h2&gt;

&lt;p&gt;We ran &lt;a href="https://agentoracle.co" rel="noopener noreferrer"&gt;AgentOracle&lt;/a&gt; head-to-head against GPT-4o on 200 claims from the &lt;a href="https://fever.ai/" rel="noopener noreferrer"&gt;FEVER dataset&lt;/a&gt; — the peer-reviewed fact verification benchmark used in dozens of published papers.&lt;/p&gt;

&lt;p&gt;Stratified sample: 67 SUPPORTS, 67 REFUTES, 66 NOT ENOUGH INFO. Random seed 42, fully reproducible. Every claim 10+ words. GPT-4o baseline via OpenRouter, single-word answer prompt, temperature 0.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;th&gt;Response time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;AgentOracle&lt;/strong&gt; (&lt;code&gt;/evaluate&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;58.4%&lt;/strong&gt; (115/197 valid)&lt;/td&gt;
&lt;td&gt;multi-source (slower)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;GPT-4o&lt;/strong&gt; (closed-source frontier)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;57.5%&lt;/strong&gt; (115/200)&lt;/td&gt;
&lt;td&gt;single call (fast)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A statistical tie on accuracy. Within measurement noise on a 200-claim sample. That's not the headline.&lt;/p&gt;

&lt;p&gt;GPT-5 shipped this week. Accuracy benchmarks will keep moving. The adversarial architecture doesn't.&lt;/p&gt;

&lt;p&gt;Full methodology, raw results, and reproducibility scripts: &lt;a href="https://github.com/TKCollective/agentoracle-fever-benchmark" rel="noopener noreferrer"&gt;github.com/TKCollective/agentoracle-fever-benchmark&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Finding: 94% Adversarial Contribution
&lt;/h2&gt;

&lt;p&gt;AgentOracle runs 4 verification sources in parallel: Sonar, Sonar Pro, Adversarial challenge, and Gemma 4. The adversarial source is the differentiator — it's deliberately prompted to argue &lt;em&gt;against&lt;/em&gt; each claim, surfacing counter-evidence instead of affirming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Of the REFUTES claims that AgentOracle correctly identified, 93.9% were flagged by the adversarial layer specifically.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's the unique signal. GPT-4o alone can't replicate it. Single-model verification confirms what's there; adversarial challenge surfaces what's missing. In agent pipelines where the cost of acting on a hallucination is a wrong action, not a wrong answer, that asymmetry matters more than accuracy parity.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claim in
    ↓
decompose (Gemma) → atomic claims
    ↓
parallel fan-out:
    ├─ Sonar: "is this true?"
    ├─ Sonar Pro: "is this true with extended reasoning?"
    ├─ Adversarial: "argue why this is false"
    └─ Gemma 4: "verify + calibrate"
    ↓
consensus + confidence calibration
    ↓
per-claim verdict: SUPPORTED / REFUTED / UNVERIFIABLE
+ evidence string
+ confidence 0.00–1.00
+ correction (if refuted)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The adversarial source is not a safety filter. It's a research task: find the best counter-argument, evidence included. Even when a claim is ultimately supported, the adversarial output becomes input to the calibration step — which is why AgentOracle's confidence scores are meaningful, not noise.&lt;/p&gt;

&lt;p&gt;Confidence calibration on the 200-claim benchmark:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Average confidence on &lt;strong&gt;correct&lt;/strong&gt; predictions: 0.61&lt;/li&gt;
&lt;li&gt;Average confidence on &lt;strong&gt;incorrect&lt;/strong&gt; predictions: 0.55&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That 6-point gap sounds small but is exactly what you want: the system is more certain when it's right, less certain when it's wrong. Agents branching on confidence thresholds get useful signal, not just theater.&lt;/p&gt;

&lt;h2&gt;
  
  
  184 Installs Decomposed
&lt;/h2&gt;

&lt;p&gt;The install curve so far:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Day 1 (launch):&lt;/strong&gt; tutorial + MCP publish → organic npm discovery → 168 installs in 24h&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Day 2:&lt;/strong&gt; +16&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nobody is running a campaign. There's no paid distribution. The installs are developers finding &lt;code&gt;agentoracle-mcp&lt;/code&gt; through:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;MCP server directories&lt;/strong&gt; — &lt;a href="https://glama.ai/mcp/servers/TKCollective/agentoracle-mcp" rel="noopener noreferrer"&gt;Glama&lt;/a&gt; auto-indexed us on publish&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;x402 discovery layers&lt;/strong&gt; — &lt;a href="https://decixa.ai" rel="noopener noreferrer"&gt;Decixa&lt;/a&gt; verified our endpoints and classified us under Analyze → Verification / Data Enrichment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Framework ecosystems&lt;/strong&gt; — &lt;code&gt;langchain-agentoracle&lt;/code&gt; and &lt;code&gt;crewai-agentoracle&lt;/code&gt; on PyPI, found by search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content entry points&lt;/strong&gt; — the LangChain &lt;a href="https://dev.to/agentoracle/how-to-add-claim-verification-to-your-langchain-agent-in-5-minutes-13ai"&gt;tutorial post&lt;/a&gt; gets indexed by Google and Dev.to's own recommendation engine&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of those are push channels. They're pull channels that compound. One tutorial gets found over and over. One MCP directory listing surfaces to every new developer exploring MCP.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Compounds
&lt;/h2&gt;

&lt;p&gt;The v2.1.0 release of &lt;code&gt;agentoracle-mcp&lt;/code&gt; (shipped this week) adds a &lt;code&gt;resolve&lt;/code&gt; tool that calls Decixa's multi-provider discovery API. An agent asking "find me a verification endpoint for analyze + verify a factual claim" gets an answer that's not hardcoded to AgentOracle. It's the best-matching x402 endpoint across the ecosystem, ranked by latency, price, and tag match.&lt;/p&gt;

&lt;p&gt;Today that returns AgentOracle first because we're the only pre-action truth oracle in that category on Decixa. As more providers list, the resolve() tool keeps working — it routes by intent, not by URL.&lt;/p&gt;

&lt;p&gt;The bet is this: in an agent economy with x402 payments, the distribution channel isn't paid ads or SEO. It's shared discovery infrastructure that every agent uses to find services. Ship the service, instrument the discovery properly, and installs compound without campaigns.&lt;/p&gt;

&lt;p&gt;We'll see how the install curve develops over the next few weeks. The bet is that showing up in the right directories, and letting the directories do the rest, produces a baseline that doesn't require acquisition spend.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Playground (no wallet, no signup):&lt;/strong&gt; &lt;a href="https://agentoracle.co" rel="noopener noreferrer"&gt;agentoracle.co&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP server&lt;/strong&gt; — plug into Claude Desktop, Cursor, Windsurf:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx agentoracle-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Python SDKs:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain-agentoracle
pip &lt;span class="nb"&gt;install &lt;/span&gt;crewai-agentoracle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;JavaScript:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;agentoracle-verify
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benchmark + reproducibility:&lt;/strong&gt; &lt;a href="https://github.com/TKCollective/agentoracle-fever-benchmark" rel="noopener noreferrer"&gt;github.com/TKCollective/agentoracle-fever-benchmark&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;The benchmark is 200 claims. The architecture is 4 sources with adversarial challenge. The distribution is shared discovery infrastructure that compounds without campaigns. Three simple facts, none of them require a marketing team. That's the model we're betting on.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>benchmarking</category>
      <category>python</category>
      <category>agents</category>
    </item>
    <item>
      <title>3 Agent Integration Patterns for Claim Verification (LangChain + CrewAI + MCP)</title>
      <dc:creator>AgentOracle</dc:creator>
      <pubDate>Thu, 23 Apr 2026 14:38:24 +0000</pubDate>
      <link>https://forem.com/agentoracle/3-agent-integration-patterns-for-claim-verification-langchain-crewai-mcp-2l3h</link>
      <guid>https://forem.com/agentoracle/3-agent-integration-patterns-for-claim-verification-langchain-crewai-mcp-2l3h</guid>
      <description>&lt;p&gt;Your agent generates a claim. Then what?&lt;/p&gt;

&lt;p&gt;In most agent pipelines: nothing. The claim flows straight into the next action — a tool call, a database write, a message sent. If the claim is wrong, the action is wrong, and the first person to notice is usually the user.&lt;/p&gt;

&lt;p&gt;There are three patterns that fix this. Each one adds a verification step between generation and action — pre-action claim verification — and each one fits a different stage of agent maturity.&lt;/p&gt;

&lt;p&gt;All three patterns below use &lt;a href="https://agentoracle.co" rel="noopener noreferrer"&gt;AgentOracle&lt;/a&gt; (free to try, no wallet, no API keys). The code works as-is. Copy it, run it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 1: Verify-Then-Act Gate (simplest)
&lt;/h2&gt;

&lt;p&gt;Your agent has exactly one claim it's about to act on. You want a hard pass/fail before anything happens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain-agentoracle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_agentoracle&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentOracleVerifyGateTool&lt;/span&gt;

&lt;span class="n"&gt;gate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentOracleVerifyGateTool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify_then_act&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action_fn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Gate an action behind a single claim verification.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Gate returns PASS/FAIL + confidence. Parse from formatted output.
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Recommendation: ACT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;action_fn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Action blocked — verification failed:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;


&lt;span class="c1"&gt;# Example: agent thinks a contract exists and wants to call it
&lt;/span&gt;&lt;span class="n"&gt;claim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Contract 0xabc...123 is a valid USDC contract on Base mainnet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="nf"&gt;verify_then_act&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;call_contract&lt;/span&gt;&lt;span class="p"&gt;(...))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single atomic claim ("X is true, therefore do Y")&lt;/li&gt;
&lt;li&gt;Binary decisions (proceed or halt)&lt;/li&gt;
&lt;li&gt;Free — &lt;code&gt;/verify-gate&lt;/code&gt; has no cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When it's not enough:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your agent generates paragraph-length output with multiple claims&lt;/li&gt;
&lt;li&gt;You need evidence for the verdict, not just pass/fail&lt;/li&gt;
&lt;li&gt;You want per-claim granularity (accept 3 of 4 claims, flag the 4th)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Pattern 2: Decompose-and-Score (most versatile)
&lt;/h2&gt;

&lt;p&gt;Your agent outputs a paragraph. Some claims are factual, some might be hallucinated, and you don't want to throw out the whole output if only one sentence is wrong.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;/evaluate&lt;/code&gt; endpoint decomposes text into atomic claims, scores each independently, and returns per-claim verdicts. You can then keep the good claims, correct the bad ones, or flag them for human review.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_agentoracle&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentOracleEvaluateTool&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;

&lt;span class="n"&gt;evaluator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentOracleEvaluateTool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;audit_agent_output&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Decompose text into claims, verify each, return structured audit.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;evaluator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# The tool returns a formatted string. Extract per-claim verdicts.
&lt;/span&gt;    &lt;span class="n"&gt;claims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\[(SUPPORTED|REFUTED|UNVERIFIABLE)\] \((\d\.\d+)\) (.+?)(?=\n\n|\Z)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DOTALL&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;claim_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;evidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:]:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Evidence:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;evidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Evidence:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;claims&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claim&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;claim_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;evidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;claims&lt;/span&gt;


&lt;span class="c1"&gt;# Example: your agent produced this summary
&lt;/span&gt;&lt;span class="n"&gt;agent_summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
OpenAI released GPT-4 in March 2023.
Bitcoin was created by Elon Musk in 2009.
Python was created by Guido van Rossum in 1991.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;audit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;audit_agent_output&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;audit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;) &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;claim&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;REFUTED&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;                        ↳ &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;evidence&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sample output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SUPPORTED      (1.00) OpenAI released GPT-4 in March 2023
REFUTED        (0.83) Bitcoin was created by Elon Musk in 2009
                        ↳ Bitcoin's creator is the pseudonymous Satoshi Nakamoto, not Elon Musk.
SUPPORTED      (1.00) Python was created by Guido van Rossum in 1991
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What you can do with this:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Keep supported, flag refuted, escalate low-confidence
&lt;/span&gt;&lt;span class="n"&gt;safe_claims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;audit&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SUPPORTED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;need_human&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;audit&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UNVERIFIABLE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;refuted&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;audit&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verdict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REFUTED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;refuted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Regenerate with the corrections inline, or just flag
&lt;/span&gt;    &lt;span class="nf"&gt;log_hallucination&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;refuted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-claim agent output (summaries, research, plans)&lt;/li&gt;
&lt;li&gt;You need evidence, not just a verdict&lt;/li&gt;
&lt;li&gt;You want to selectively keep/reject claims&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Pattern 3: Multi-Agent Supervisor (most advanced)
&lt;/h2&gt;

&lt;p&gt;Now you have a CrewAI crew with a researcher agent and a writer agent. The writer is about to publish. You want a supervisor agent that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Discovers the right verification provider (via Decixa's multi-provider registry, falling back to local)&lt;/li&gt;
&lt;li&gt;Calls that provider to audit the writer's draft&lt;/li&gt;
&lt;li&gt;Only passes the draft through if verification clears a threshold&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is where &lt;a href="https://www.npmjs.com/package/agentoracle-mcp" rel="noopener noreferrer"&gt;AgentOracle's MCP server&lt;/a&gt; shines. It exposes both the &lt;code&gt;resolve&lt;/code&gt; tool (discovery) and the verification tools in a single MCP binary any agent can call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# No install — runs via npx&lt;/span&gt;
npx agentoracle-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hook it into Claude Desktop or Cursor or any MCP-compatible runtime. Then in your agent framework:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Crew&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai_agentoracle&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentOracleEvaluateTool&lt;/span&gt;

&lt;span class="c1"&gt;# Writer agent — generates content, might hallucinate
&lt;/span&gt;&lt;span class="n"&gt;writer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Technical Writer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Draft a factual summary of recent AI regulation news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Experienced technical writer. Optimizes for readability.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Supervisor agent — audits the writer's output using AgentOracle
&lt;/span&gt;&lt;span class="n"&gt;supervisor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fact-Check Supervisor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Catch hallucinations before the writer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s draft ships&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pedantic editor. Refuses to pass content with unverified claims.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;AgentOracleEvaluateTool&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;draft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write 3 sentences about recent AI regulation news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A 3-sentence factual summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Evaluate the writer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s draft using the AgentOracle tool. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If any claim is REFUTED, return &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BLOCKED: &amp;lt;reason&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If overall confidence is below 0.7, return &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;NEEDS_HUMAN_REVIEW&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Otherwise return &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;APPROVED&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; plus the cleaned draft.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;supervisor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;APPROVED | BLOCKED | NEEDS_HUMAN_REVIEW + reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nc"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;supervisor&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;review&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this pattern matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Separation of concerns: one agent writes, one agent verifies&lt;/li&gt;
&lt;li&gt;The supervisor can be a smaller, cheaper model — it just needs to call the tool and apply logic&lt;/li&gt;
&lt;li&gt;Works in CrewAI, AutoGen, LangGraph — any framework that supports agent-as-tool-user&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The discovery angle:&lt;/strong&gt; If you want the supervisor to choose verification providers dynamically (not hardcode AgentOracle), use the &lt;code&gt;resolve&lt;/code&gt; tool (v2.1.0 of agentoracle-mcp, via &lt;a href="https://decixa.ai" rel="noopener noreferrer"&gt;Decixa&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;capability&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyze&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verify a factual claim before acting&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Returns the best-matching x402 verification endpoint across the ecosystem, ranked by latency, price, and tag match. AgentOracle is the only pre-action truth oracle currently classified under "Analyze → Verification" on Decixa, so it'll come back first today. As more providers list, your supervisor automatically picks the best one for each query.&lt;/p&gt;




&lt;h2&gt;
  
  
  Which Pattern To Pick
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Your agent setup&lt;/th&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single binary decision&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1. Verify-then-act gate&lt;/strong&gt; (free, &lt;code&gt;/verify-gate&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Paragraph output, need per-claim scoring&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2. Decompose-and-score&lt;/strong&gt; (free during beta, &lt;code&gt;/evaluate&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-agent pipeline, supervisor pattern&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;3. Multi-agent supervisor&lt;/strong&gt; (CrewAI + MCP)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All three work together. Start with Pattern 1 while you're prototyping. Graduate to Pattern 2 when your agent produces structured output. Move to Pattern 3 when you have a real pipeline with distinct agent roles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Playground (no setup):&lt;/strong&gt; &lt;a href="https://agentoracle.co" rel="noopener noreferrer"&gt;agentoracle.co&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Packages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pip install langchain-agentoracle&lt;/code&gt; — &lt;a href="https://pypi.org/project/langchain-agentoracle/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pip install crewai-agentoracle&lt;/code&gt; — &lt;a href="https://pypi.org/project/crewai-agentoracle/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npx agentoracle-mcp&lt;/code&gt; — &lt;a href="https://www.npmjs.com/package/agentoracle-mcp" rel="noopener noreferrer"&gt;npm&lt;/a&gt; (Claude Desktop, Cursor, Windsurf)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm install agentoracle-verify&lt;/code&gt; — &lt;a href="https://www.npmjs.com/package/agentoracle-verify" rel="noopener noreferrer"&gt;npm&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/TKCollective/x402-research-skill" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benchmark:&lt;/strong&gt; We ran AgentOracle head-to-head against GPT-4o on 200 peer-reviewed FEVER claims. &lt;a href="https://github.com/TKCollective/agentoracle-fever-benchmark" rel="noopener noreferrer"&gt;Results + methodology.&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Hallucinations aren't a bug to patch. They're a property of large language models that doesn't go away with bigger training runs or better prompts. The only reliable fix is to add a verification step your agent can't bypass.&lt;/p&gt;

&lt;p&gt;These three patterns are what that step looks like in production code.&lt;/p&gt;

</description>
      <category>langchain</category>
      <category>ai</category>
      <category>python</category>
      <category>agents</category>
    </item>
    <item>
      <title>24 hours of organic discovery: what we learned from our first external users</title>
      <dc:creator>AgentOracle</dc:creator>
      <pubDate>Wed, 22 Apr 2026 04:14:56 +0000</pubDate>
      <link>https://forem.com/agentoracle/24-hours-of-organic-discovery-what-we-learned-from-our-first-external-users-135i</link>
      <guid>https://forem.com/agentoracle/24-hours-of-organic-discovery-what-we-learned-from-our-first-external-users-135i</guid>
      <description>&lt;p&gt;Yesterday we published a tutorial. No list. No paid promotion. No cold outreach.&lt;br&gt;
By Tuesday morning, five developers and two autonomous agents had found AgentOracle and run real evaluations.&lt;br&gt;
Who showed up:&lt;/p&gt;

&lt;p&gt;Starlink — Albuquerque, NM&lt;br&gt;
Comcast — Rockville, MD&lt;br&gt;
Charter/Spectrum — Missoula, MT&lt;br&gt;
Azure cloud agent — Des Moines, IA&lt;br&gt;
Azure cloud agent — Chicago, IL&lt;/p&gt;

&lt;p&gt;Same path every time: tutorial → playground → /evaluate&lt;br&gt;
The Azure IPs are the most interesting signal. Those aren't humans clicking a tutorial — those are autonomous agents running on cloud infrastructure that found the playground and ran evaluations on their own. That's exactly the use case we built for.&lt;br&gt;
MCP server shipped tonight. Your agent can find it the same way they did.&lt;/p&gt;

&lt;p&gt;npx agentoracle-mcp&lt;br&gt;
Or hit the playground directly: agentoracle.co&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>agents</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How to Add Claim Verification to Your LangChain Agent in 5 Minutes</title>
      <dc:creator>AgentOracle</dc:creator>
      <pubDate>Mon, 20 Apr 2026 16:01:18 +0000</pubDate>
      <link>https://forem.com/agentoracle/how-to-add-claim-verification-to-your-langchain-agent-in-5-minutes-13ai</link>
      <guid>https://forem.com/agentoracle/how-to-add-claim-verification-to-your-langchain-agent-in-5-minutes-13ai</guid>
      <description>&lt;p&gt;Your LangChain agent is wrong about 10% of the time. Not occasionally — consistently, confidently, and silently.&lt;/p&gt;

&lt;p&gt;The problem isn't the model. It's that your agent has no way to know when it's wrong. It receives information, formats a response, and acts. No second opinion. No fact-check. No circuit breaker.&lt;/p&gt;

&lt;p&gt;This tutorial shows you how to add a verification layer in 5 minutes that catches hallucinations before your agent acts on them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;LLM hallucination rates in 2026 range from 3% to 20% depending on the task. On a summarization benchmark, GPT-4 looks great. On open-ended factual questions — the kind your agent asks constantly — it's a different story.&lt;/p&gt;

&lt;p&gt;The deeper problem: reasoning models hallucinate more on factual tasks, not less. The more a model "thinks through" an answer, the more likely it is to fill gaps with plausible-sounding fiction.&lt;/p&gt;

&lt;p&gt;In a simple chatbot, a hallucination is embarrassing. In an autonomous agent pipeline, it's a wrong action. A refunded order, a bad recommendation, a compliance violation, a message sent to the wrong person.&lt;/p&gt;

&lt;p&gt;The standard fix is human review. But human review defeats the purpose of an autonomous agent.&lt;/p&gt;

&lt;p&gt;The real fix is a verification layer that runs before your agent acts — independently of the model that generated the claim.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain-agentoracle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No API keys. No configuration. The free tier gives you 20 preview verifications per hour to test with.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Start: Verify Before Your Agent Acts
&lt;/h2&gt;

&lt;p&gt;The simplest integration — verify a piece of text and get per-claim verdicts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_agentoracle&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentOracleEvaluateTool&lt;/span&gt;

&lt;span class="n"&gt;verifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentOracleEvaluateTool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Your agent just generated this text — is it true?
&lt;/span&gt;&lt;span class="n"&gt;agent_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
OpenAI released GPT-4 in March 2023.
Bitcoin was created by Elon Musk.
The Python programming language was created by Guido van Rossum.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;verifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what comes back:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EVALUATION RESULT
Overall confidence: 0.61
Recommendation: ACT
Claims found: 3 | Supported: 2 | Refuted: 1 | Unverifiable: 0
Sources used: sonar, sonar-pro, adversarial, gemma-4

CLAIMS:
  ✓ [SUPPORTED] (1.00) OpenAI released GPT-4 in March 2023
    Evidence: Widely documented historical fact; GPT-4 was announced
    and released on March 14, 2023.

  ✗ [REFUTED] (0.83) Bitcoin was created by Elon Musk
    Evidence: Bitcoin's creator is the pseudonymous Satoshi Nakamoto.
    Correction: Bitcoin was created by Satoshi Nakamoto, not Elon Musk.

  ✓ [SUPPORTED] (1.00) Python was created by Guido van Rossum
    Evidence: Confirmed in official Python documentation and
    Van Rossum's own statements.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three claims went in. Two came back supported with evidence. One came back &lt;strong&gt;refuted with a correction&lt;/strong&gt;. Your agent now knows claim #2 is wrong before it acts on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Add It to Your Agent's Toolbelt
&lt;/h2&gt;

&lt;p&gt;Want your agent to verify claims on its own? Add the tools directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_agentoracle&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_agentoracle_tools&lt;/span&gt;

&lt;span class="c1"&gt;# Returns all 6 AgentOracle tools ready for your agent
&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_agentoracle_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Or pick specific ones:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_agentoracle&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;AgentOracleEvaluateTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Per-claim verification ($0.01)
&lt;/span&gt;    &lt;span class="n"&gt;AgentOracleVerifyGateTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Quick pass/fail gate (free)
&lt;/span&gt;    &lt;span class="n"&gt;AgentOraclePreviewTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;# Research preview (free, 20/hr)
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tools follow LangChain's &lt;code&gt;BaseTool&lt;/code&gt; interface, so they plug into any agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;initialize_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentType&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_agentoracle&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentOracleEvaluateTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentOraclePreviewTool&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nc"&gt;AgentOracleEvaluateTool&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="nc"&gt;AgentOraclePreviewTool&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;initialize_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AgentType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OPENAI_FUNCTIONS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# The agent can now verify claims before acting
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Check if this is true: Tesla&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s market cap exceeded $2 trillion in 2024&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Verify-Then-Act Pattern
&lt;/h2&gt;

&lt;p&gt;The most useful pattern: gate your agent's actions on verification confidence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_agentoracle&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentOracleEvaluateTool&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;verifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentOracleEvaluateTool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify_then_act&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;confidence_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Only act if verification confidence exceeds threshold.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;verifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Parse the confidence from the result
&lt;/span&gt;    &lt;span class="c1"&gt;# The tool returns a formatted string with overall confidence
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Overall confidence:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;conf_line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Overall confidence&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conf_line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;confidence_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ VERIFIED (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;) — safe to act&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;⚠️ LOW CONFIDENCE (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;) — hold for review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="c1"&gt;# In your agent pipeline:
&lt;/span&gt;&lt;span class="n"&gt;claim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The Federal Reserve raised interest rates in March 2024&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;verify_then_act&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# proceed with the action
&lt;/span&gt;    &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# flag for human review or use a fallback
&lt;/span&gt;    &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Free Quick Check: The Verify Gate
&lt;/h2&gt;

&lt;p&gt;Don't need per-claim breakdowns? The verify gate gives you a fast pass/fail:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_agentoracle&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentOracleVerifyGateTool&lt;/span&gt;

&lt;span class="n"&gt;gate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentOracleVerifyGateTool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Quick binary check — free, no payment needed
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The speed of light is approximately 300,000 km per second&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# VERIFY GATE: FAIL
# Confidence: 1.00
# Recommendation: ACT
# ("FAIL" = gate found no issues — content is safe to act on)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why AgentOracle
&lt;/h2&gt;

&lt;p&gt;Most hallucination detection tools are built for humans — dashboards, observability platforms, monitoring UIs. They tell you what went wrong after the fact.&lt;/p&gt;

&lt;p&gt;AgentOracle is built for agents. It sits in the pipeline, takes any text, runs it through 4 independent verification sources in parallel, and returns a machine-readable verdict before your agent acts.&lt;/p&gt;

&lt;p&gt;No dashboards. No subscriptions. No API keys to configure. Your agent calls &lt;code&gt;/evaluate&lt;/code&gt;, gets &lt;code&gt;ACT / VERIFY / REJECT&lt;/code&gt; with a confidence score and evidence, and decides what to do next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's under the hood:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;4 independent sources: Sonar, Sonar Pro, Adversarial challenge, and Gemma 4&lt;/li&gt;
&lt;li&gt;Per-claim decomposition — complex text gets broken into individual verifiable claims&lt;/li&gt;
&lt;li&gt;Confidence calibration across sources&lt;/li&gt;
&lt;li&gt;Evidence and corrections for every verdict&lt;/li&gt;
&lt;li&gt;1,900+ claim fingerprints in the database and growing daily&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Now
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Playground&lt;/strong&gt; — no setup, no payment: &lt;a href="https://agentoracle.co" rel="noopener noreferrer"&gt;agentoracle.co&lt;/a&gt;&lt;br&gt;
Paste any text and see per-claim verdicts in under 15 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Packages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pip install langchain-agentoracle&lt;/code&gt; — &lt;a href="https://pypi.org/project/langchain-agentoracle/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pip install crewai-agentoracle&lt;/code&gt; — &lt;a href="https://pypi.org/project/crewai-agentoracle/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm install agentoracle-verify&lt;/code&gt; — &lt;a href="https://www.npmjs.com/package/agentoracle-verify" rel="noopener noreferrer"&gt;npm&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/TKCollective/x402-research-skill" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Hallucinations aren't going away. The models are getting better, but "better" still means wrong 3-10% of the time on the tasks your agents actually run.&lt;/p&gt;

&lt;p&gt;A verification layer doesn't replace a good model. It catches the cases where even a good model is confidently wrong — which is exactly when you need it most.&lt;/p&gt;

</description>
      <category>langchain</category>
      <category>python</category>
      <category>ai</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
