<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: concordance AI</title>
    <description>The latest articles on Forem by concordance AI (@concordance_ai).</description>
    <link>https://forem.com/concordance_ai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3950698%2F77c2b59b-a29f-46d1-9917-0acec1f34ceb.png</url>
      <title>Forem: concordance AI</title>
      <link>https://forem.com/concordance_ai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/concordance_ai"/>
    <language>en</language>
    <item>
      <title>I asked three AI models the same API question. Only one had it right.</title>
      <dc:creator>concordance AI</dc:creator>
      <pubDate>Mon, 25 May 2026 15:09:34 +0000</pubDate>
      <link>https://forem.com/concordance_ai/i-asked-three-ai-models-the-same-api-question-only-one-had-it-right-3lfd</link>
      <guid>https://forem.com/concordance_ai/i-asked-three-ai-models-the-same-api-question-only-one-had-it-right-3lfd</guid>
      <description>&lt;p&gt;One Tuesday I wasted two hours chasing a Bitrix24 (an ERP/CRM platform) API method that doesn't exist. The model I asked described it like it was right there in the docs - full description, code example, confident tone. The method was &lt;code&gt;crm.item.userfield.add&lt;/code&gt;. Made up.&lt;/p&gt;

&lt;p&gt;The real one is &lt;code&gt;userfieldconfig.add&lt;/code&gt;. It's in the official documentation.&lt;/p&gt;

&lt;p&gt;That evening I kept thinking about one thing: what if I could see when models disagree? Not which one is right - I won't always know. Just a signal. &lt;em&gt;Something's off here, check before you use it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So I built a tool. Three models, same question, in parallel. Watch where they split. I added an interface, then more features, then other people started using it. Now it's a product, which still feels weird to say about something I built for my own Tuesday afternoons.&lt;/p&gt;




&lt;p&gt;A few weeks ago I ran a benchmark - 60 questions, half general knowledge, half narrow technical (specific API methods, library behavior, niche platforms).&lt;/p&gt;

&lt;p&gt;General questions: median consensus &lt;strong&gt;92.5&lt;/strong&gt;. Models hedge on subjective questions and tend to say the same things in slightly different words.&lt;/p&gt;

&lt;p&gt;Technical questions: median consensus &lt;strong&gt;33&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The Bitrix case is the clearest example. Question: how do you create a custom user field for a smart process in Bitrix24?&lt;/p&gt;

&lt;p&gt;Three answers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model 1: &lt;code&gt;crm.item.userfield.add&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Model 2: &lt;code&gt;crm.userfield.add&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Model 3: &lt;code&gt;userfieldconfig.add&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I checked all three against the official docs. Only one - &lt;code&gt;userfieldconfig.add&lt;/code&gt; - was the right method for smart processes. The other two were either invented or borrowed from a different part of the API where they don't apply.&lt;/p&gt;

&lt;p&gt;All three answered with the same confident tone. No hedging, no uncertainty. If you'd asked just one and gotten a wrong answer, you'd have had no reason not to trust it.&lt;/p&gt;




&lt;p&gt;Worth being precise about what the consensus score means.&lt;/p&gt;

&lt;p&gt;It doesn't tell you which answer is correct - the synthesizer model underneath doesn't have access to ground truth either. It tells you something simpler: when three independently queried models converge, you're asking about something well-covered in training data. When they diverge, the data is thin or inconsistent, and at least one model is guessing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;33% consensus&lt;/strong&gt; means three models, three different answers. Someone's wrong. Probably two of them.&lt;/p&gt;

&lt;p&gt;General questions cluster at &lt;strong&gt;90–95%&lt;/strong&gt;. That's just well-covered territory, not a useful signal either way. The outliers are what matter - specific API methods, recent spec changes, niche platform behavior. These appear rarely enough in training data that different models develop different "memories" of the same fact.&lt;/p&gt;

&lt;p&gt;You can't fix this by switching to a better model. It's a triangulation problem.&lt;/p&gt;




&lt;p&gt;I'm a single developer. I built this because I kept running into the same specific thing - not just "AI got it wrong," but "AI got it wrong and sounded exactly as confident as when it gets it right." That's hard to work around without a cross-check.&lt;/p&gt;

&lt;p&gt;Free tier: 3 queries - try it on something you've been trusting one model for. Founding tier: $9/month for the first 100 people, price locked for 3 years.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://useconcordance.com/" rel="noopener noreferrer"&gt;https://useconcordance.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1gdg0bnlom5ng3giksi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1gdg0bnlom5ng3giksi.png" alt=" " width="800" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>webdev</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
