<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Bag of words</title>
    <description>The latest articles on Forem by Bag of words (@bagofwords).</description>
    <link>https://forem.com/bagofwords</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2811167%2Fda9bcf3e-990b-4fcc-8a4c-31fc500baa0b.png</url>
      <title>Forem: Bag of words</title>
      <link>https://forem.com/bagofwords</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/bagofwords"/>
    <language>en</language>
    <item>
      <title>Set up an open-source AI analyst for PostgreSQL in 2 minutes</title>
      <dc:creator>Bag of words</dc:creator>
      <pubDate>Fri, 24 Oct 2025 07:45:13 +0000</pubDate>
      <link>https://forem.com/bagofwords/set-up-an-open-source-ai-analyst-for-postgresql-in-2-minutes-3b5l</link>
      <guid>https://forem.com/bagofwords/set-up-an-open-source-ai-analyst-for-postgresql-in-2-minutes-3b5l</guid>
      <description>&lt;p&gt;AI is going to be the interface for data - that's clear. But most teams aren't running AI analysts in production yet. They're stuck experimenting because the AI doesn't understand their business context, answers are inconsistent, and there's no way to see what's breaking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bag of words&lt;/strong&gt; is an open-source framework that solves this. Deploy an AI analyst on PostgreSQL with full observability and control. Customize the context by teaching it your business definitions, connect your dbt models, BI, and documentation, and watch it improve over time as it learns from usage patterns and feedback. Your team asks questions in plain language and gets dashboards that actually make sense. Setup takes a few minutes.&lt;/p&gt;

&lt;p&gt;Here's how to do it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hzlxcuc74lt18ld9ug4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hzlxcuc74lt18ld9ug4.png" alt="Bag of words" width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before you begin, you'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Docker&lt;/strong&gt; installed on your machine (&lt;a href="https://docs.docker.com/get-docker/" rel="noopener noreferrer"&gt;installation guide&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;PostgreSQL database&lt;/strong&gt; (local or remote) with some data&lt;/li&gt;
&lt;li&gt;Your Postgres &lt;strong&gt;connection string&lt;/strong&gt; ready (e.g., &lt;code&gt;postgresql://user:password@host:5432/database&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;API key&lt;/strong&gt; from your preferred LLM provider (OpenAI, Anthropic, Azure OpenAI, or Google)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1 — Deploy Bag of words
&lt;/h2&gt;

&lt;p&gt;Let's start by deploying &lt;strong&gt;Bag of words&lt;/strong&gt; locally using Docker.&lt;/p&gt;

&lt;p&gt;Run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--pull&lt;/span&gt; always &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:3000 bagofwords/bagofwords
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After a few seconds, the service will be running. Open your browser and navigate to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://0.0.0.0:3000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see the &lt;strong&gt;Bag of words&lt;/strong&gt; onboarding flow. Let's walk through it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz80yxy68j7pzd0g9q4bb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz80yxy68j7pzd0g9q4bb.png" alt="Welcome screen" width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2 — Configure Your LLM
&lt;/h2&gt;

&lt;p&gt;The first step in onboarding is connecting to an LLM provider.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Choose your LLM provider: &lt;strong&gt;OpenAI&lt;/strong&gt;, &lt;strong&gt;Anthropic&lt;/strong&gt;, &lt;strong&gt;Azure OpenAI&lt;/strong&gt;, or &lt;strong&gt;Google&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Enter your API key&lt;/li&gt;
&lt;li&gt;(Optional) Select a specific model (e.g., &lt;code&gt;GPT&lt;/code&gt;, &lt;code&gt;Claude&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Test Connection&lt;/strong&gt; to verify&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Next&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Bag of words is LLM-agnostic. You bring your own key and choose your provider. This gives you control over cost, performance, and data residency. &lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3 — Connect Your Data Source and Select Tables
&lt;/h2&gt;

&lt;p&gt;Now let's connect to your PostgreSQL database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connect the Data Source
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Select &lt;strong&gt;PostgreSQL&lt;/strong&gt; from the list of available data sources&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Enter your connection details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Name:&lt;/strong&gt; Something descriptive like "Production Analytics"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Host:&lt;/strong&gt; Your database host (e.g., &lt;code&gt;localhost&lt;/code&gt; or &lt;code&gt;db.example.com&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Port:&lt;/strong&gt; &lt;code&gt;5432&lt;/code&gt; (default)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database:&lt;/strong&gt; Your database name&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Username&lt;/strong&gt; and &lt;strong&gt;Password&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Or paste your full connection string&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click &lt;strong&gt;Test Connection&lt;/strong&gt; to verify&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click &lt;strong&gt;Next&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Select Tables
&lt;/h3&gt;

&lt;p&gt;After connecting, Bag of words introspects your database schema. You'll see a list of all available tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqv3b2i26yb3f8qi6pqq8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqv3b2i26yb3f8qi6pqq8.png" alt="Select tables for AI context" width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose which tables the AI can access:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select &lt;strong&gt;all tables&lt;/strong&gt; if you want the AI to have full visibility&lt;/li&gt;
&lt;li&gt;Or pick &lt;strong&gt;specific tables&lt;/strong&gt; based on what your team needs&lt;/li&gt;
&lt;li&gt;You can always adjust this later in Settings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This table selection impacts &lt;strong&gt;AI performance&lt;/strong&gt;. Start with a few relevant tables and gradually add more as needed. Fewer tables means more focused context and better query accuracy.&lt;/p&gt;

&lt;p&gt;Click &lt;strong&gt;Next&lt;/strong&gt; when ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4 — Ask Your First Question
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7jg7hozc9u5dy5n6q47b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7jg7hozc9u5dy5n6q47b.png" alt="Main" width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The onboarding is complete! Now you can start asking questions.&lt;/p&gt;

&lt;p&gt;The interface will show you some &lt;strong&gt;conversation starters&lt;/strong&gt; to get you going, or you can type your own question in plain English.&lt;/p&gt;

&lt;p&gt;Let's try an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Show me a line chart of daily active users for the last 30 days
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hit &lt;strong&gt;Enter&lt;/strong&gt; and watch what happens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Behind the scenes&lt;/strong&gt;, the AI agent:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4sacqekcrx4bqizt8xbn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4sacqekcrx4bqizt8xbn.png" alt="AI Analyst agent flow" width="800" height="485"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reads your question&lt;/li&gt;
&lt;li&gt;Examines the database schema and context&lt;/li&gt;
&lt;li&gt;If the agent feels low confidence/context is not enough, it stops and asks for clarification. Otherwise - continue&lt;/li&gt;
&lt;li&gt;Plans a data model and generates code&lt;/li&gt;
&lt;li&gt;Executes it against your Postgres database&lt;/li&gt;
&lt;li&gt;Returns the result as a table and (if appropriate) a chart&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You'll see the data model being constructed in real time, followed by code generation and execution to get the data. Then you'll get a &lt;strong&gt;line chart&lt;/strong&gt; showing the trend.&lt;/p&gt;

&lt;h3&gt;
  
  
  When Questions Are Ambiguous
&lt;/h3&gt;

&lt;p&gt;If your question is ambiguous, the AI will &lt;strong&gt;ask for clarification&lt;/strong&gt; before proceeding. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Show me revenue by region
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI might respond: &lt;em&gt;"I found both &lt;code&gt;gross_revenue&lt;/code&gt; and &lt;code&gt;net_revenue&lt;/code&gt; columns. Which one should I use?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You clarify: &lt;em&gt;"Use gross_revenue"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The AI then generates the correct query and returns your chart.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here's the best part:&lt;/strong&gt; After answering your question, the AI will &lt;strong&gt;suggest an instruction&lt;/strong&gt; based on the clarification:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"When user asks for data related to 'revenue', by default use &lt;code&gt;gross_revenue&lt;/code&gt;"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Click &lt;strong&gt;Accept&lt;/strong&gt;, and this rule is saved as an instruction. Next time anyone asks about revenue, the AI will know what you mean—no clarification needed.&lt;/p&gt;

&lt;p&gt;This feedback loop means the system gets smarter over time, learning your business logic with every interaction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5 — Add Context to Improve Accuracy
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkfxju5zppzqtp5hgt4ni.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkfxju5zppzqtp5hgt4ni.png" alt="Context" width="800" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is the key to reliable AI analyst deployments.&lt;/strong&gt; The solution is only as good as the context you build.&lt;/p&gt;

&lt;p&gt;Context comes from two sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Machine-generated&lt;/strong&gt; — Usage patterns, clarifications, and learnings from production queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-provided&lt;/strong&gt; — Instructions, dbt docs, and semantic layer enrichments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Add instructions:&lt;/strong&gt; Click &lt;strong&gt;Instructions&lt;/strong&gt; above the prompt box. Write business rules in plain language like &lt;em&gt;"Revenue means gross_revenue"&lt;/em&gt; or &lt;em&gt;"Active users have last_seen_at within 30 days"&lt;/em&gt;. These apply to every query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connect your semantic layer:&lt;/strong&gt; Go to &lt;strong&gt;Integrations&lt;/strong&gt; → &lt;strong&gt;Context&lt;/strong&gt; and connect your dbt project, LookML files, or markdown documentation. The AI will automatically index your models, descriptions, and relationships—then reference them by name when generating queries.&lt;/p&gt;

&lt;p&gt;Together, these create a knowledge base that makes the AI increasingly accurate and aligned with your business logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6 — Monitor and Track AI Analyst Quality
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpwsddpmejxmmd6f5m9rh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpwsddpmejxmmd6f5m9rh.png" alt="Monitoring AI Analyst" width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everything is tracked. Go to &lt;strong&gt;Monitoring&lt;/strong&gt; to see a complete audit trail of all AI interactions—every query, every clarification, every piece of feedback.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Metrics to Track
&lt;/h3&gt;

&lt;p&gt;While the product exposes many detailed metrics, here are the high-level indicators you should monitor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context coverage&lt;/strong&gt; — Frequency with which the context (instructions, enrichments) equips the agent with adequate confidence for your prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy (judge)&lt;/strong&gt; — Automated quality scores from AI judges evaluating correctness&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Negative feedback&lt;/strong&gt; — User thumbs down signals that need investigation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clarification rate&lt;/strong&gt; — How often the AI needs to ask for clarification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These metrics tell you whether your AI analyst is production-ready and where to focus your context-building efforts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In just a few minutes, you've deployed an open-source AI analyst on PostgreSQL, connected your database, asked natural language questions, added business context, and learned how to monitor and evaluate query quality.&lt;/p&gt;

&lt;p&gt;Unlike black-box AI SQL tools, &lt;strong&gt;Bag of words&lt;/strong&gt; gives you the tools to &lt;strong&gt;actually get to production&lt;/strong&gt;. Every query is traceable. Every decision is visible. You can see when your metrics show it's ready, and you control the context, the LLM, and the governance rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You've Built
&lt;/h3&gt;

&lt;p&gt;You now have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A natural language interface to your Postgres database&lt;/li&gt;
&lt;li&gt;Context-aware query generation using your business definitions&lt;/li&gt;
&lt;li&gt;Full observability into how the AI reasons and what SQL it generates&lt;/li&gt;
&lt;li&gt;A foundation for building dashboards, Slack bots, or embedded analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;p&gt;From here, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Invite members and manage permissions&lt;/strong&gt; set governance for data sources and reports&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrate with Slack&lt;/strong&gt; to let your team ask questions directly in channels&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build dashboards&lt;/strong&gt; by saving queries and pinning them to a shared view&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customize the LLM&lt;/strong&gt; (swap OpenAI for Anthropic, Gemini, or a self-hosted model)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy to production&lt;/strong&gt; using Docker Compose or Kubernetes in your own VPC&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To learn more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.bagofwords.io" rel="noopener noreferrer"&gt;Full documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/bagofwords/bagofwords" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI is becoming the interface for data—but only if it's &lt;strong&gt;trustworthy&lt;/strong&gt;. Now you have the tools to make it so.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>dataengineering</category>
      <category>data</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building Reliable AI Analysts: An Observability Framework for Text-to-SQL Systems</title>
      <dc:creator>Bag of words</dc:creator>
      <pubDate>Wed, 15 Oct 2025 15:23:15 +0000</pubDate>
      <link>https://forem.com/bagofwords/building-reliable-ai-analysts-an-observability-framework-for-text-to-sql-systems-25ln</link>
      <guid>https://forem.com/bagofwords/building-reliable-ai-analysts-an-observability-framework-for-text-to-sql-systems-25ln</guid>
      <description>&lt;p&gt;You're building a text-to-SQL system. The value prop is obvious: natural language over your data warehouse, instant answers for the business, no waiting on BI teams. The demo works. Your stakeholders love it.&lt;/p&gt;

&lt;p&gt;Then you put it in production and the accuracy problems start. "Active users" means three different things across teams. Joins look right but query the wrong grain. Fiscal quarters don't match calendar quarters. The SQL runs, the numbers look plausible, but they're quietly wrong. Trust erodes fast.&lt;/p&gt;

&lt;p&gt;This post covers the tactical pieces: where accuracy actually fails, the few metrics that matter for monitoring, and how to build a feedback loop that turns failures into improvements. These patterns come from building and shipping Bow, an open-source AI analyst, but apply to any text-to-SQL system in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Accuracy Breaks
&lt;/h2&gt;

&lt;p&gt;Assume you have the agentic infrastructure right: &lt;a href="https://docs.bagofwords.com/core/agent" rel="noopener noreferrer"&gt;ReAct loops&lt;/a&gt; that reason and validate, retrieval systems, the ability to say "I don't know" when uncertain, and comprehensive context coverage.&lt;/p&gt;

&lt;p&gt;Even with all that, AI still fails in:&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;
        &lt;strong&gt;Ambiguous metrics&lt;/strong&gt; — "Active users" means different things across teams. Product uses recency windows and user type filters. Marketing excludes test accounts. Finance counts paying customers only. The model queries &lt;code&gt;dim_active_users&lt;/code&gt; but misses the filters business actually uses this quarter.
    &lt;/li&gt;
    &lt;li&gt;
        &lt;strong&gt;Schema traps&lt;/strong&gt; — Table and column names that seem obvious lead the model down wrong join paths. Built on wrong grain, missing crucial filters. The SQL runs, numbers look plausible, no error message—just quietly wrong results.
    &lt;/li&gt;
    &lt;li&gt;
        &lt;strong&gt;Code errors&lt;/strong&gt; — Syntax failures, permission boundaries, query timeouts. The model reaches for tables or patterns it doesn't understand. Small runtime errors compound into retried plans and inconsistent behavior.
    &lt;/li&gt;
    &lt;li&gt;
        &lt;strong&gt;User CSAT&lt;/strong&gt; — Low satisfaction scores, wrong answers flagged by users, eroded trust. When users continue iterating or reject answers, you've lost reliability.
    &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All these failures boil down to one root cause: &lt;strong&gt;prompt-context misalignment&lt;/strong&gt;. When the prompt falls within your encoded context, the model produces reliable results. When it falls outside, the model guesses—and guesses look plausible but are often wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Track
&lt;/h2&gt;

&lt;p&gt;You don't need a complex dashboard filled with vanity metrics. You need signals that tell you where context is missing and what to fix.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj6pi2ifjd4fh92th3j3j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj6pi2ifjd4fh92th3j3j.png" alt="AI Analyst Observability Dashboard" width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Track these four metrics:&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;
        &lt;strong&gt;Answer Quality&lt;/strong&gt;
        &lt;em&gt;What:&lt;/em&gt; Is the answer correct and useful? Would you share it with a stakeholder?
        &lt;em&gt;Detected by:&lt;/em&gt; LLM judges scoring context-prompt match and answer correctness. In practice, combine automated checks (SQL validity, result plausibility) with human review of a rotating sample. Start with labeling 10-20 queries per day across different question types.
        &lt;em&gt;Catches:&lt;/em&gt; Ambiguous metrics, schema traps, business logic mismatches.
    &lt;/li&gt;
    &lt;li&gt;
        &lt;strong&gt;Context Effectiveness&lt;/strong&gt;
        &lt;em&gt;What:&lt;/em&gt; Did the system retrieve and use the right instructions, schema, and metadata?
        &lt;em&gt;Detected by:&lt;/em&gt; Semantic similarity between questions and your definitions, clarification request patterns, agent action traces. For data engineers: spikes in clarifications around specific tables signal documentation gaps.
        &lt;em&gt;Catches:&lt;/em&gt; Missing metric definitions, incomplete documentation, context gaps by domain or table.
    &lt;/li&gt;
    &lt;li&gt;
        &lt;strong&gt;Code Errors&lt;/strong&gt;
        &lt;em&gt;What:&lt;/em&gt; SQL execution failures that indicate the model reached for things it doesn't understand.
        &lt;em&gt;Detected by:&lt;/em&gt; Syntax failures, permission issues, query timeouts. Track which tables/columns consistently trigger errors.
        &lt;em&gt;Catches:&lt;/em&gt; Schema traps, wrong join paths, execution fragility.
    &lt;/li&gt;
    &lt;li&gt;
        &lt;strong&gt;User Feedback&lt;/strong&gt;
        &lt;em&gt;What:&lt;/em&gt; Ground truth of what's actually broken in production.
        &lt;em&gt;Detected by:&lt;/em&gt; Users flagging wrong answers, continued iteration, answer rejections.
        &lt;em&gt;Catches:&lt;/em&gt; All failure modes in production, especially edge cases testing didn't cover.
    &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;In practice:&lt;/strong&gt; Instrument these signals at the agent run level—every query, every user interaction. Store the full trace: what context was retrieved, what actions the agent took, what SQL was generated, what results came back, and how the user reacted.&lt;/p&gt;

&lt;p&gt;As patterns emerge, go deeper. Track negative feedback by the specific table or column that caused the issue. Measure which type of context is most effective (instructions vs. schema vs. dbt models). Analyze clarification clusters by domain. Score feedback from power users differently—they understand the data model and their signals are high-quality.&lt;/p&gt;

&lt;p&gt;Think of this as unit testing for AI outputs. You wouldn't ship code without tests—why ship answers without validation? The difference is that your tests evolve: what fails today becomes tomorrow's regression test, encoded as instructions that prevent the same failure from happening again.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Turn Observations Into Fixes
&lt;/h2&gt;

&lt;p&gt;Metrics without action are just numbers. The real value comes from closing the loop: using what you observe to systematically improve the system.&lt;/p&gt;

&lt;p&gt;Every failure points to missing context—a metric definition the model doesn't know, a join path it shouldn't take, or business logic that's not codified. The fix isn't rebuilding your model or restructuring your warehouse. It's encoding that missing context as an instruction the system can apply automatically next time a similar question appears.&lt;/p&gt;

&lt;p&gt;This approach complements your data modeling work—instructions handle business logic and edge cases without requiring schema changes or dbt rebuilds. Think of them as runtime metadata that sits alongside your warehouse, capturing the operational context that doesn't belong in table definitions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm26586k4215pxetezrh0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm26586k4215pxetezrh0.png" alt="Error Classification and Remediation" width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
    &lt;li&gt;
        &lt;strong&gt;Diagnose the root cause&lt;/strong&gt;
        Start by looking at the different failure types: code errors, negative feedback, low-quality answers, and clarification requests. Then dig into the agent traces—the step-by-step reasoning and decision path the system followed. What action did it take: generate SQL, search data, or ask for clarification? If it generated SQL, where did it go wrong? Did it retrieve the wrong schema? Misinterpret a metric? Choose a bad join path? If it asked for clarification, what context or tool was missing? Understanding the "why" is critical before you write a fix.
    &lt;/li&gt;
    &lt;li&gt;
        &lt;strong&gt;Draft and test the instruction&lt;/strong&gt;
        Write a scoped rule that addresses the root cause. Then test it: run through a cycle with recent prompts that failed and verify the instruction actually fixes them. This is your chance to catch edge cases before rolling anything out.
    &lt;/li&gt;
    &lt;li&gt;
        &lt;strong&gt;Review or approve&lt;/strong&gt;
        Decide if the instruction needs human review or can be approved immediately. Treat it like a pull request—some changes are obviously safe (fixing a typo in a metric name), others need domain expertise (redefining "active users"). You wouldn't merge code without review—don't merge business logic without it either. Route accordingly.
    &lt;/li&gt;
    &lt;li&gt;
        &lt;strong&gt;Roll out and track&lt;/strong&gt;
        Once approved, the instruction gets attached to the relevant domains, tables, or metrics. It automatically applies when similar prompts appear. Then track your metrics over time: did answer quality improve? Did code errors drop? Did negative feedback decrease?
    &lt;/li&gt;
    &lt;li&gt;
        &lt;strong&gt;Self-learning mode (highly recommended)&lt;/strong&gt;
        For teams that want to move faster, enable AI auto-generation of instructions. When the system detects low-quality results or recurring errors, it can draft a proposed instruction automatically, test it against recent failed queries, and route it for approval. This works by prompting the model to analyze the failure pattern, propose a fix as a natural language instruction, and validate it against a test set. The human remains in the loop for approval, but the heavy lifting of diagnosis and drafting happens automatically. This dramatically shortens the feedback loop from days to minutes, though you'll want to start with human-in-the-loop mode until you trust the quality of auto-generated instructions.
    &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This workflow is faster than traditional data modeling cycles, more transparent than black-box model tuning, and safer than letting the model improvise business definitions on the fly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Text-to-SQL will become the interface for data because AI can reason, explore, and surface insights that static dashboards never will. But moving from demo to dependable production requires structure: understanding where accuracy breaks, measuring what actually matters, and closing the loop by encoding failures as instructions with visible impact.&lt;/p&gt;

&lt;p&gt;The promise is real. The path to get there just requires more rigor than most demos let on.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it yourself&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This observability framework is built into Bag of words, an open-source AI analyst designed for production use. Deploy it to your warehouse and start tracking these metrics today.&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Documentation:&lt;/strong&gt; &lt;a href="https://docs.bagofwords.com" rel="noopener noreferrer"&gt;https://docs.bagofwords.com&lt;/a&gt;&lt;br&gt;&lt;br&gt;
→ &lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/bagofwords1/bagofwords" rel="noopener noreferrer"&gt;https://github.com/bagofwords1/bagofwords&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Building in the open—contributions and feedback welcome.&lt;/p&gt;

</description>
      <category>text2sql</category>
      <category>ai</category>
      <category>data</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Build a Product Usage Dashboard with Bag of words (Open Source)</title>
      <dc:creator>Bag of words</dc:creator>
      <pubDate>Mon, 03 Feb 2025 19:18:32 +0000</pubDate>
      <link>https://forem.com/bagofwords/build-a-product-usage-dashboard-with-bag-of-words-open-source-48ml</link>
      <guid>https://forem.com/bagofwords/build-a-product-usage-dashboard-with-bag-of-words-open-source-48ml</guid>
      <description>&lt;p&gt;In this guide, we’ll build a product usage dashboard using &lt;a href="https://github.com/bagofwords1/bagofwords" rel="noopener noreferrer"&gt;Bag of words&lt;/a&gt;, an open-source AI-powered data tool. It connects to your databases, APIs, and even unstructured data like PDFs, allowing you to create dashboards through simple prompts — no manual SQL needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhl8hcrgiz90lgwotbakc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhl8hcrgiz90lgwotbakc.png" alt="Image description" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://bagofwords.com" rel="noopener noreferrer"&gt;Bag of words&lt;/a&gt; is designed to let you generate reports, charts, and dashboards using natural language prompts. Best of all, it’s open-source and quick to set up using Docker.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;Deploy with Docker *&lt;/em&gt;&lt;br&gt;
By default, Bag of words uses SQLite, but you can configure PostgreSQL if needed.&lt;/p&gt;

&lt;p&gt;Run the following command to get started:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--pull&lt;/span&gt; always &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:3000 bagofwords/bagofwords
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Connect Your Data &amp;amp; Configure LLM
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Add an LLM provider at &lt;a href="http://localhost:3000/settings/models" rel="noopener noreferrer"&gt;http://localhost:3000/settings/models&lt;/a&gt; to enable report generation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Connect your data sources at &lt;a href="http://localhost:3000/integrations" rel="noopener noreferrer"&gt;http://localhost:3000/integrations&lt;/a&gt; and follow the instructions for your specific data sources.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Build your dashboard
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Understand Data Schema&lt;/strong&gt;&lt;br&gt;
Before diving into data visualization, it’s crucial to understand your data schema — the structure of your data, how tables are connected, and what fields are available. Besides viewing tables via UI, you can also explore the data schema via prompting:&lt;/p&gt;

&lt;p&gt;• “Show me the list of all tables and their relationships.”&lt;br&gt;
• “What columns are available in the users table?”&lt;br&gt;
• “How is the transactions table linked to the users table?”&lt;br&gt;
• “List all fields related to user activity and session data.”&lt;/p&gt;

&lt;p&gt;This will give you a clear view of your dataset, helping you craft more precise queries and visualizations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generate Key Metrics with Prompts&lt;/strong&gt;&lt;br&gt;
Once you’re familiar with your schema, you can start generating key metrics:&lt;/p&gt;

&lt;p&gt;“Show me daily active users over the past month in a line chart.”&lt;br&gt;
“Create a dashboard with total signups, feature usage, and churn rate.”&lt;br&gt;
“List the top 10 most used features by active users in a bar chart.”&lt;br&gt;
“Display the retention rate for users over the last six months.”&lt;br&gt;
You can also run this in a single prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4ll4tpkshxn342nedec.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4ll4tpkshxn342nedec.png" alt="Image description" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Publish and Share
&lt;/h2&gt;

&lt;p&gt;Now that your data is ready, you can set the layout in the dashboard by dragging the different elements. You can also ask the AI to do it for you.&lt;/p&gt;

&lt;p&gt;Once everything’s ready, you can click to the top right Share button and configure the settings and get the sharable URL.&lt;/p&gt;

&lt;p&gt;More links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://bagofwords.com" rel="noopener noreferrer"&gt;https://bagofwords.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.bagofwords.com" rel="noopener noreferrer"&gt;https://docs.bagofwords.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/bagofwords1/bagofwords" rel="noopener noreferrer"&gt;https://github.com/bagofwords1/bagofwords&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>database</category>
      <category>data</category>
      <category>datascience</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
