<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Eitamos Ring</title>
    <description>The latest articles on Forem by Eitamos Ring (@eitamos_ring_0508146ca448).</description>
    <link>https://forem.com/eitamos_ring_0508146ca448</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3395918%2F5ae6659e-ea1d-4142-b982-ec79776600b3.png</url>
      <title>Forem: Eitamos Ring</title>
      <link>https://forem.com/eitamos_ring_0508146ca448</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/eitamos_ring_0508146ca448"/>
    <language>en</language>
    <item>
      <title>Three Rules for Designing a Go SDK Other People Will Actually Use</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Thu, 07 May 2026 18:00:00 +0000</pubDate>
      <link>https://forem.com/eitamos_ring_0508146ca448/three-rules-for-designing-a-go-sdk-other-people-will-actually-use-53b4</link>
      <guid>https://forem.com/eitamos_ring_0508146ca448/three-rules-for-designing-a-go-sdk-other-people-will-actually-use-53b4</guid>
      <description>&lt;p&gt;I publish open-source Go libraries. &lt;br&gt;
Not many people use most of them, and I've spent a fair amount of time trying to figure out why. Some of it is distribution. Some of it is the unsexy truth that nobody needed the thing I built. But a real chunk of it — bigger than I want to admit — is that the API was designed for &lt;em&gt;me&lt;/em&gt;, the author, and not for the developer arriving cold from a Google search at 2am with a deadline.&lt;/p&gt;

&lt;p&gt;This post is three rules I now apply when designing a Go SDK. They come from publishing &lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;postgresparser&lt;/a&gt; — a pure-Go PostgreSQL parser — and watching where new users got stuck. The examples are from that library, but the rules aren't about parsers. They're about what the surface of a Go package should look like if you want strangers to use it.&lt;/p&gt;

&lt;p&gt;I'll also flag one place I broke my own rule, because the post would be dishonest without it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Rule 1: Expose answers, not nodes
&lt;/h2&gt;

&lt;p&gt;The single biggest mistake I see in Go SDKs (and that I've made myself) is shipping the &lt;em&gt;internal data model&lt;/em&gt; as the public API. The author has built an AST, or a state machine, or a config tree, and they think: "great, I'll let the caller walk it." The caller does not want to walk it. The caller wants an answer to a specific question.&lt;/p&gt;

&lt;p&gt;Here's what "expose the nodes" looks like in a SQL parsing context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// What other Go SQL parsers tend to give you&lt;/span&gt;
&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Statements&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SelectStmt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;from&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;sel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;From&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RangeVar&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;tables&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tables&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Relname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="c"&gt;// ...also handle JoinExpr, Subquery, RangeFunction,&lt;/span&gt;
            &lt;span class="c"&gt;// RangeTableSample, RangeTableFunc, CTERef...&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user came to your library to find out which tables a query touches. You handed them a tree-walking exercise and a list of node types they have to learn. Every caller of your library now has to write — and maintain — the same boilerplate, with the same bugs, in slightly different ways.&lt;/p&gt;

&lt;p&gt;Compare:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// What postgresparser gives you&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;postgresparser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseSQL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tables&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Two lines. CTEs, subqueries, set operations, joins — all flattened into the same field, with aliases preserved. The IR (the actual AST-equivalent) still exists internally, but it's not what the caller binds to.&lt;/p&gt;

&lt;p&gt;The principle: &lt;strong&gt;for every question your SDK answers, there should be a single field or function whose name &lt;em&gt;is&lt;/em&gt; the question.&lt;/strong&gt; "Which tables?" → &lt;code&gt;Tables&lt;/code&gt;. "Which columns are filtered?" → &lt;code&gt;ExtractWhereConditions&lt;/code&gt;. "How is each column used?" → &lt;code&gt;ColumnUsage&lt;/code&gt;. If a user has to traverse three levels of struct to get an answer, the answer wasn't really exposed.&lt;/p&gt;

&lt;p&gt;The objection I hear: &lt;em&gt;but what if the caller wants something custom that we didn't anticipate?&lt;/em&gt; Fine — keep the IR public for the 5% case. But default to answering the 95% case in one line, and only fall back to the IR when the typed accessor doesn't cover the question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule 2: Name the common case after the common case, and mark the variants
&lt;/h2&gt;

&lt;p&gt;Most Go SDKs I see treat all of their entry points as peers. &lt;code&gt;Parse&lt;/code&gt;, &lt;code&gt;ParseStrict&lt;/code&gt;, &lt;code&gt;ParseAll&lt;/code&gt;, &lt;code&gt;ParseWithOptions&lt;/code&gt;, &lt;code&gt;ParseFromReader&lt;/code&gt; — all listed in pkg.go.dev with the same visual weight, and the user has to read every one to figure out which they want.&lt;/p&gt;

&lt;p&gt;This is the "tyranny of options" failure. The author thought of every variant; the user has to think about it too.&lt;/p&gt;

&lt;p&gt;The fix is sequencing. Pick the version 80% of users want. Give &lt;em&gt;that&lt;/em&gt; the short name. Make the other variants explicitly named after the thing that makes them different.&lt;/p&gt;

&lt;p&gt;postgresparser's parsing entry points:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// 80% case — parses one statement, gives you a result.&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;postgresparser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseSQL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// "I might pass multiple statements and want all of them."&lt;/span&gt;
&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;postgresparser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseSQLAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// "I want an error if more than one statement was passed."&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;postgresparser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseSQLStrict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ParseSQL&lt;/code&gt; is the default. &lt;code&gt;ParseSQLAll&lt;/code&gt; and &lt;code&gt;ParseSQLStrict&lt;/code&gt; are explicitly named after the property that makes them different (handling all statements, strict-on-multi). A user reading the package docs sees &lt;code&gt;ParseSQL&lt;/code&gt; first, tries it, and only goes looking for the variants if they hit a case it doesn't cover.&lt;/p&gt;

&lt;p&gt;The wrong version of the same API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Don't do this&lt;/span&gt;
&lt;span class="n"&gt;postgresparser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseSQL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ParseOptions&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Strict&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AllStatements&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;postgresparser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseSQL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ParseOptions&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Strict&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AllStatements&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You've moved the decision from the function name (where it's documented and grep-able) to a config struct (where it's not). New users have to read the options struct just to call the function. Existing code has to be re-read every time someone wants to know what mode it's in.&lt;/p&gt;

&lt;p&gt;The principle: &lt;strong&gt;the most common call should be the shortest call. Variants get names that describe how they differ. Config structs are for things that don't fit in a name, not for things that do.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule 3: Return structured data, not strings the caller has to re-parse
&lt;/h2&gt;

&lt;p&gt;This one I see less often in writing about SDK design, but it's the one that bites users hardest in practice.&lt;/p&gt;

&lt;p&gt;If your SDK has done work to extract structured information from unstructured input, &lt;em&gt;don't throw the structure away on the way out&lt;/em&gt;. Returning &lt;code&gt;[]string&lt;/code&gt; when you could have returned &lt;code&gt;[]struct{...}&lt;/code&gt; is a tax you charge every caller forever.&lt;/p&gt;

&lt;p&gt;postgresparser extracts WHERE conditions. The naive return type would be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Bad: caller has to re-parse what you already parsed&lt;/span&gt;
&lt;span class="n"&gt;conditions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ExtractWhereConditions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c"&gt;// returns: ["status = 'active'", "total &amp;gt; 100"]&lt;/span&gt;

&lt;span class="c"&gt;// Now every caller writes a regex. They get it wrong.&lt;/span&gt;
&lt;span class="c"&gt;// They handle = and != but forget IS NULL. They miss BETWEEN.&lt;/span&gt;
&lt;span class="c"&gt;// They re-introduce the bug your library was built to solve.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it actually returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Condition&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Column&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Operator&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Value&lt;/span&gt;    &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;conditions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ExtractWhereConditions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"SELECT * FROM orders WHERE status = 'active' AND total &amp;gt; 100"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;conditions&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%s %s %v&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Operator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="c"&gt;// status = active&lt;/span&gt;
&lt;span class="c"&gt;// total &amp;gt; 100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the caller can ask &lt;code&gt;c.Column == "tenant_id"&lt;/code&gt; directly. They can switch on &lt;code&gt;c.Operator&lt;/code&gt;. They can type-assert &lt;code&gt;c.Value&lt;/code&gt;. None of them have to write a regex, and none of them re-introduce parsing bugs at the boundary of your library.&lt;/p&gt;

&lt;p&gt;The principle: &lt;strong&gt;if the structure exists internally, expose the structure. Strings are for things that have no structure, or for things the user is going to print.&lt;/strong&gt; Stringly-typed return values are how libraries become impossible to use correctly at scale.&lt;/p&gt;

&lt;p&gt;The reverse also holds: if you find yourself writing a long regex inside a library you depend on, that library failed Rule 3.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I broke my own rule
&lt;/h2&gt;

&lt;p&gt;In the spirit of not pretending I have all this figured out: postgresparser violates Rule 2 with &lt;code&gt;ParseSQLWithOptions(sql, opts)&lt;/code&gt;. It exists alongside &lt;code&gt;ParseSQL(sql)&lt;/code&gt;, takes a config struct with extraction flags like &lt;code&gt;IncludeCreateTableFieldComments&lt;/code&gt;, and is exactly the "tyranny of options" pattern I just told you to avoid.&lt;/p&gt;

&lt;p&gt;The honest reason it exists: comment extraction is expensive and most callers don't need it, but I didn't want to design a separate &lt;code&gt;ParseSQLWithComments&lt;/code&gt; function because the option might evolve. So I shipped a &lt;code&gt;WithOptions&lt;/code&gt; escape hatch and told myself it was fine. It's not fine — it's a slow leak that will get bigger as more options accrete. The right move would have been a separate named function for the one option that exists today, and a real opt-in API design when the second option arrives. &lt;/p&gt;

&lt;p&gt;I'm flagging it so you can see what the wrong choice looks like even when the author knew the rule.&lt;/p&gt;

&lt;p&gt;The point of including this isn't self-deprecation. It's that &lt;strong&gt;you will violate your own rules&lt;/strong&gt;. The goal isn't a perfect API on day one; it's noticing the violation, naming it, and fixing it before the wrong shape hardens into a public contract you can't change.&lt;/p&gt;

&lt;h2&gt;
  
  
  TLDR;
&lt;/h2&gt;

&lt;p&gt;If you can't remember three rules, remember the question they all answer: &lt;strong&gt;what does the user have to learn before they can use this library?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rule 1 says: don't make them learn your AST.&lt;/li&gt;
&lt;li&gt;Rule 2 says: don't make them learn your option matrix.&lt;/li&gt;
&lt;li&gt;Rule 3 says: don't make them re-parse what you already parsed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every line of documentation a user has to read before their first successful call is friction. Some of it is unavoidable. A lot of it isn't, and that's where the design work is.&lt;/p&gt;




&lt;p&gt;postgresparser is on GitHub at &lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;github.com/ValkDB/postgresparser&lt;/a&gt; if you want to see what these rules look like applied (and, per the section above, where they aren't yet). Issues and PRs welcome — particularly the kind that point out a rule I missed.&lt;/p&gt;

</description>
      <category>api</category>
      <category>design</category>
      <category>go</category>
      <category>opensource</category>
    </item>
    <item>
      <title>What I Learned Building a Pure Go PostgreSQL Parser</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Tue, 05 May 2026 06:19:05 +0000</pubDate>
      <link>https://forem.com/eitamos_ring_0508146ca448/a-pure-go-postgresql-parser-passed-200-stars-heres-what-i-learned-5eg6</link>
      <guid>https://forem.com/eitamos_ring_0508146ca448/a-pure-go-postgresql-parser-passed-200-stars-heres-what-i-learned-5eg6</guid>
      <description>&lt;h2&gt;
  
  
  Why I built it
&lt;/h2&gt;

&lt;p&gt;I needed a PostgreSQL parser that could run inside Go tooling without CGO, external binaries, or runtime dependencies.&lt;/p&gt;

&lt;h2&gt;
  
  
  What made PostgreSQL parsing harder than expected
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;SQL is not one grammar&lt;/li&gt;
&lt;li&gt;PostgreSQL has a lot of dialect-specific edge cases&lt;/li&gt;
&lt;li&gt;AST shape matters more than “can it parse”&lt;/li&gt;
&lt;li&gt;Error handling becomes a product feature&lt;/li&gt;
&lt;li&gt;Real-world SQL is uglier than examples&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why pure Go mattered
&lt;/h2&gt;

&lt;p&gt;No CGO, easy installation, works in CI, easy to embed in linters and developer tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  What 200+ GitHub stars taught me
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Developers care about boring installation&lt;/li&gt;
&lt;li&gt;Parser APIs need to be simple&lt;/li&gt;
&lt;li&gt;Good examples matter more than perfect docs&lt;/li&gt;
&lt;li&gt;People want tooling, not academic grammar dumps&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where it’s going
&lt;/h2&gt;

&lt;p&gt;This parser is becoming the foundation for Valk Guard, a local-first static analyzer for SQL and ORM usage. No LLM required. It works from ASTs and deterministic rules.&lt;/p&gt;

&lt;p&gt;GitHub repo: &lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;GitHub repo: https://github.com/ValkDB/postgresparser&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>postgres</category>
      <category>database</category>
      <category>opensource</category>
    </item>
    <item>
      <title>What every `?` in your SQL is hiding</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Mon, 04 May 2026 18:48:09 +0000</pubDate>
      <link>https://forem.com/eitamos_ring_0508146ca448/what-every-in-your-sql-is-hiding-o4o</link>
      <guid>https://forem.com/eitamos_ring_0508146ca448/what-every-in-your-sql-is-hiding-o4o</guid>
      <description>&lt;p&gt;Take a query that comes out of &lt;code&gt;pg_stat_statements&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Six question marks. Each one means something completely different.&lt;/p&gt;

&lt;p&gt;The first, inside &lt;code&gt;date_trunc&lt;/code&gt;, expects a string like &lt;code&gt;'week'&lt;/code&gt; — it's telling the function which time bucket to use. The second is a timestamp comparing against &lt;code&gt;created_at&lt;/code&gt;. The third is a number comparing against &lt;code&gt;amount&lt;/code&gt;. The fourth is a string joined through to the customers table — it has to match a &lt;code&gt;plan&lt;/code&gt; value over there. The fifth, sitting bare inside &lt;code&gt;GROUP BY&lt;/code&gt;, is a positional integer like &lt;code&gt;1&lt;/code&gt;, pointing back at the first column in the SELECT list. It's not a value, it's an &lt;em&gt;index&lt;/em&gt;. The sixth, after &lt;code&gt;LIMIT&lt;/code&gt;, is a page-size integer.&lt;/p&gt;

&lt;p&gt;Six placeholders, four different value types, two completely different &lt;em&gt;kinds&lt;/em&gt; of integer. There isn't a regex that gets all six right — not without re-implementing a SQL parser inside it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;postgresparser&lt;/code&gt; is the open-source Go/ANTLR PostgreSQL parser we maintain at ValkDB. Until this release, when you got back an AST, every &lt;code&gt;?&lt;/code&gt; was just a leaf node with positional information and nothing else. The parser knew exactly what each &lt;code&gt;?&lt;/code&gt; meant — it had to, in order to parse — but it never told you. So everyone downstream fell back to regular expressions, string scanning, and increasingly elaborate guesswork.&lt;/p&gt;

&lt;p&gt;This week's release tells you what every &lt;code&gt;?&lt;/code&gt; actually is.&lt;/p&gt;




&lt;h2&gt;
  
  
  The new API
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AnalyzeSQL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;querySQL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Placeholders&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"placeholder %d: role=%s column=%s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ColumnRef&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;placeholder 1: role=function_arg     column=         (date_trunc, arg 0)
placeholder 2: role=where_value      column=created_at
placeholder 3: role=where_value      column=amount
placeholder 4: role=where_value      column=plan
placeholder 5: role=group_by_ordinal column=
placeholder 6: role=limit            column=
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Six placeholders, six correct classifications, no string scanning. Switch on the role, fill in the right value.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the old way failed
&lt;/h2&gt;

&lt;p&gt;Without role information, this is the pipeline most tools end up with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────┐
│  Normalized SQL     │
│  with ? placeholders│
└──────────┬──────────┘
           │
           ▼
┌─────────────────────────────────┐
│   Regex sweep for "?"           │
│                                 │
│   finds ? in string literals    │
│   finds ? in comments           │
│   can't see GROUP BY context    │
│   mis-IDs JSONB ? operator      │
│   picks same value twice for    │
│   same column on &amp;gt;= and &amp;lt;       │
└──────────┬──────────────────────┘
           │
           ▼
┌─────────────────────┐
│  Hand-written       │
│  per-position guess │
│  (fragile)          │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Substituted SQL    │
│  often broken       │
└─────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The role-aware version skips all of that by walking the parse tree the parser already built. String literals are leaves of their own kind, so question marks inside them are never seen as placeholders. Comments are stripped before tree construction. The JSONB operator is parsed as an operator node, not a placeholder leaf, so it never enters the placeholder list. &lt;code&gt;GROUP BY&lt;/code&gt; and &lt;code&gt;ORDER BY&lt;/code&gt; ordinals carry their own dedicated role. And every placeholder's syntactic role — its actual position in the grammar — comes back attached.&lt;/p&gt;




&lt;h2&gt;
  
  
  The five footguns this release closes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The JSONB &lt;code&gt;?&lt;/code&gt; operator is not a placeholder
&lt;/h3&gt;

&lt;p&gt;PostgreSQL has three jsonb operators that look like placeholder tokens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'key'&lt;/span&gt;                  &lt;span class="c1"&gt;-- "does jsonb contain top-level key?"&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="o"&gt;?|&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'a'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'b'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;        &lt;span class="c1"&gt;-- "any of these keys?"&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="o"&gt;?&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'a'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'b'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;        &lt;span class="c1"&gt;-- "all of these keys?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A regex sweep can't tell these apart from real placeholders. The new placeholder list excludes JSONB operator tokens by construction.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;INTERVAL ?&lt;/code&gt; actually parses
&lt;/h3&gt;

&lt;p&gt;Before this release, &lt;code&gt;INTERVAL ?&lt;/code&gt; was rejected with a syntax error — a real problem if you consume &lt;code&gt;pg_stat_statements&lt;/code&gt;, because every query that uses an interval literal gets normalized to that form. The grammar now accepts a parameter token in interval-operand position.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;code&gt;?&lt;/code&gt; inside string literals stays inside string literals
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;notes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'has a ?'&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;notes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'don&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;t mark me ?'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The collector walks the parse tree, never the raw SQL — so string-literal &lt;code&gt;?&lt;/code&gt; and comment &lt;code&gt;?&lt;/code&gt; simply don't appear in the placeholder list.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;code&gt;GROUP BY ?&lt;/code&gt; is an ordinal, not a value
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;pg_stat_statements&lt;/code&gt; rewrites &lt;code&gt;GROUP BY 1, 2&lt;/code&gt; to &lt;code&gt;GROUP BY ?, ?&lt;/code&gt;. These placeholders need to be substituted with positional integers referring to SELECT-list slots — not with arbitrary values. A dedicated role makes this explicit.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Function-argument placeholders need to know their function
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first &lt;code&gt;?&lt;/code&gt; must be a string like &lt;code&gt;'week'&lt;/code&gt;. The second must be a string like &lt;code&gt;'year'&lt;/code&gt;. Both are function-args, but the function differs — so the right substitution differs. Each placeholder of this kind now carries its parent function name and argument index.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who this is for
&lt;/h2&gt;

&lt;p&gt;If you build an ORM or query builder and you've ever wanted to type-check a placeholder before binding to it, this is for you. If you build a SQL linter, a migration tool that rewrites queries between dialects, a monitoring agent that ingests &lt;code&gt;pg_stat_statements&lt;/code&gt;, an AI-assisted SQL generator that emits parameterized queries — same. The common thread is that you have a normalized SQL string with &lt;code&gt;?&lt;/code&gt; placeholders in it, and you need to know what each one means before you can do anything useful.&lt;/p&gt;

&lt;p&gt;If that sounds like work you've done, you've probably written a private placeholder classifier already. With this release, you don't have to.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;The parser tells you what the SQL says; type inference belongs a layer up. The API stays narrow on purpose — roles, positions, and the structural context needed to make sense of them. Function-wrapper exposure on column usage is next on the roadmap; lateral-join and recursive-CTE refinements after that.&lt;/p&gt;

&lt;p&gt;The parser knew. Now it tells you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;postgresparser&lt;/a&gt; — open-source PostgreSQL parser. Go, ANTLR-based. Contributions welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>postgres</category>
      <category>opensource</category>
      <category>development</category>
    </item>
    <item>
      <title>A Protobuf for Database Schemas</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Wed, 18 Mar 2026 07:52:54 +0000</pubDate>
      <link>https://forem.com/eitamos_ring_0508146ca448/a-protobuf-for-database-schemas-pc8</link>
      <guid>https://forem.com/eitamos_ring_0508146ca448/a-protobuf-for-database-schemas-pc8</guid>
      <description>&lt;p&gt;Every serious system has an interface definition for its wire format. gRPC has protobuf. REST has OpenAPI. GraphQL has its SDL. But databases -- the thing everything else is built on top of -- have nothing.&lt;/p&gt;

&lt;p&gt;Your database schema is one of the most important artifacts in your system. It defines every table, column, type, constraint, relationship, and index. It encodes years of domain decisions. And yet there is no standard, portable, machine-readable format for it.&lt;/p&gt;

&lt;p&gt;We built one. We call it &lt;code&gt;ctxexport.json&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem is older than LLMs
&lt;/h2&gt;

&lt;p&gt;Before you assume this is an AI-context story, consider how many times you have needed your schema outside the database itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Onboarding a new engineer who needs to understand the data model.&lt;/li&gt;
&lt;li&gt;Diffing staging against production to catch drift before a deploy.&lt;/li&gt;
&lt;li&gt;Running a linter in CI to enforce naming conventions or catch missing indexes.&lt;/li&gt;
&lt;li&gt;Generating documentation that is not immediately stale.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every time, you end up writing a bespoke script that queries &lt;code&gt;information_schema&lt;/code&gt; or &lt;code&gt;pg_catalog&lt;/code&gt;, parses the output, and feeds it into whatever tool you need. The script is Postgres-specific. It breaks when you add a second schema. Nobody maintains it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pg_dump --schema-only&lt;/code&gt; exists, but it is a restore format, not a consumption format. It is Postgres-specific SQL with &lt;code&gt;SET&lt;/code&gt; statements, ownership clauses, and an ordering designed for replay, not reading. Try parsing it reliably. Try feeding it to a linter. Try diffing two of them without drowning in noise.&lt;/p&gt;

&lt;p&gt;MongoDB is worse. There is no &lt;code&gt;mongodump --schema-only&lt;/code&gt;. Your schema lives in the shape of whatever documents happen to exist. Good luck extracting that into something a tool can reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extract once, use many ways
&lt;/h2&gt;

&lt;p&gt;The core insight behind &lt;code&gt;ctxexport.json&lt;/code&gt; is the same one behind protobuf: &lt;strong&gt;separate the definition from the consumption&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A protobuf &lt;code&gt;.proto&lt;/code&gt; file is written once and compiled to Go structs, Python classes, TypeScript types, gRPC stubs, or REST gateways. The definition is the single source of truth. The consumers are many and varied.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ctxexport.json&lt;/code&gt; works the same way. You extract your schema once -- from Postgres, MongoDB, or whatever backend -- and produce a single canonical JSON file. That file contains entities (tables, views, collections), fields (columns with types, nullability, defaults), edges (foreign keys and inferred references), and access paths (indexes). Everything a tool needs to understand your data model, nothing it does not.&lt;/p&gt;

&lt;p&gt;From that single artifact, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compile to a lighthouse map&lt;/strong&gt; -- a compact table-and-relationship summary that fits in an LLM prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compile to full SQL DDL&lt;/strong&gt; -- standard &lt;code&gt;CREATE TABLE&lt;/code&gt; statements for any subset of tables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serve over MCP&lt;/strong&gt; -- give an AI agent schema awareness without database credentials.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diff across environments&lt;/strong&gt; -- compare staging and production schemas as structured data, not text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lint offline&lt;/strong&gt; -- check naming conventions, missing indexes, or orphaned foreign keys in CI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate in CI&lt;/strong&gt; -- catch schema regressions before they reach production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commit to git&lt;/strong&gt; -- your schema becomes a versioned artifact with a real history.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these consumers need to know whether the source was Postgres or MongoDB. None of them need a live database connection. The extraction happened once, upstream, and everything downstream reads the same contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  The sidecar pattern
&lt;/h2&gt;

&lt;p&gt;Databases have never been good at carrying human knowledge alongside the schema. Your &lt;code&gt;users.deleted_at&lt;/code&gt; column is a soft-delete flag, but the database only knows it is a &lt;code&gt;timestamp with time zone&lt;/code&gt;. Your &lt;code&gt;orders.payload&lt;/code&gt; column is JSONB with a specific structure, but the database sees an opaque blob.&lt;/p&gt;

&lt;p&gt;A sidecar file (&lt;code&gt;dbdense.yaml&lt;/code&gt;) layers descriptions and value annotations onto the extracted schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;entities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;payments&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorized"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paid"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refunded"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;deleted_at&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Soft&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;delete&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;timestamp.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;NULL&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;active."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This merges at export time. The compiled DDL gets inline comments like &lt;code&gt;-- Values: pending, authorized, paid, failed, refunded&lt;/code&gt;. Every downstream consumer -- linter, LLM, documentation generator -- picks it up automatically. Write it once in a YAML file committed to the repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why JSON, not SQL
&lt;/h2&gt;

&lt;p&gt;SQL DDL is human-readable but machine-hostile. Parsing &lt;code&gt;CREATE TABLE&lt;/code&gt; statements reliably across dialects is a nightmare. Defaults are quoted differently. Constraints can be inline or out-of-band. Comments use different syntax. There is no standard way to represent a foreign key relationship as structured data.&lt;/p&gt;

&lt;p&gt;JSON is boring and that is the point. It is a declarative state representation -- you look up a table by name, not by parsing DDL statement order. Every language has a JSON parser. The schema is simple: a version string, an array of entities, and an array of edges. You can validate it with a JSON Schema. You can diff it with &lt;code&gt;jq&lt;/code&gt;. You can read it in any language without a SQL parser.&lt;/p&gt;

&lt;p&gt;A minimal entity looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"payments"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"table"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uuid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"is_pk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"not_null"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"values"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"pending"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"paid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"failed"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Flat, predictable, zero ambiguity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stop treating your schema like a black box
&lt;/h2&gt;

&lt;p&gt;The immediate use case is LLM context -- giving AI agents schema awareness without live database access. But the format is deliberately general. If your tool can read JSON, it can read a database schema. That was not true before.&lt;/p&gt;

&lt;p&gt;The project is at &lt;a href="https://github.com/valkdb/dbdense" rel="noopener noreferrer"&gt;github.com/valkdb/dbdense&lt;/a&gt;. The contract is documented in &lt;code&gt;docs/ctxexport-contract.md&lt;/code&gt;. It supports Postgres and MongoDB today. The extractor interface is small enough that adding a new backend is a single file.&lt;/p&gt;

&lt;p&gt;Your database schema is too important to be locked inside the database. Export it. Version it. Build on it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Stop Sending 93K Tokens of Schema to Your LLM Agent!</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Wed, 18 Mar 2026 07:52:06 +0000</pubDate>
      <link>https://forem.com/eitamos_ring_0508146ca448/stop-sending-93k-tokens-of-schema-to-your-llm-agent-5c67</link>
      <guid>https://forem.com/eitamos_ring_0508146ca448/stop-sending-93k-tokens-of-schema-to-your-llm-agent-5c67</guid>
      <description>&lt;p&gt;I've watched agents query &lt;code&gt;information_schema&lt;/code&gt; over and over, spending 4-6 turns just to figure out which tables exist, what columns they have, and how they join. On a 500-table database, the full DDL is around 93,000 tokens. Most questions touch 3-5 tables. On a complex multi-table join, I measured a 64% token reduction by just giving the agent the schema upfront.&lt;/p&gt;

&lt;p&gt;That's what &lt;a href="https://github.com/valkdb/dbdense" rel="noopener noreferrer"&gt;dbdense&lt;/a&gt; does.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/valkdb/dbdense" rel="noopener noreferrer"&gt;dbdense&lt;/a&gt; to fix this.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;dbdense is a three-step offline pipeline: &lt;strong&gt;extract&lt;/strong&gt;, &lt;strong&gt;compile&lt;/strong&gt;, &lt;strong&gt;serve&lt;/strong&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Extract&lt;/strong&gt; connects to your database once and snapshots the schema into a portable JSON file (&lt;code&gt;ctxexport.json&lt;/code&gt;). Tables, columns, types, primary keys, foreign keys, indexes -- everything an LLM needs to write correct queries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Compile&lt;/strong&gt; turns that snapshot into two artifacts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;lighthouse&lt;/strong&gt; -- a compact table map (~4K tokens for 500 tables). It looks like this:
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; T:users|J:orders,sessions
 T:orders|E:payload,shipping|J:payments,shipments,users
 T:payments|J:orders
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Every table, its FK neighbors, and embedded docs. 23x smaller than full DDL. This stays in the agent's context so it always knows what's available.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full DDL&lt;/strong&gt; -- standard &lt;code&gt;CREATE TABLE&lt;/code&gt; statements with constraints, rendered on demand only for the specific tables the agent asks about.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Serve&lt;/strong&gt; (optional) exposes the lighthouse as an MCP resource and the DDL via an MCP &lt;code&gt;slice&lt;/code&gt; tool. The agent reads the map, picks the tables it needs, and gets back just those definitions.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After the extract, everything runs locally. The compiled artifacts are plain text you can commit to your repo. No database connection needed at runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  No credentials in the agent runtime
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;export&lt;/code&gt; step is the only step that touches the database. After that, &lt;code&gt;compile&lt;/code&gt; and &lt;code&gt;serve&lt;/code&gt; work from the local snapshot. Your production database credentials never need to be in the agent's environment. The tool works offline and air-gapped.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;I ran an agentic benchmark: n=3, same 5 questions, same seeded Postgres database (20K+ rows, 8 tables), same model (Claude Sonnet 4). One arm had only a Postgres MCP tool. The other had the same tool plus dbdense schema context injected into the prompt.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Without schema context&lt;/th&gt;
&lt;th&gt;With dbdense&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Correct answers&lt;/td&gt;
&lt;td&gt;13/15&lt;/td&gt;
&lt;td&gt;13/15&lt;/td&gt;
&lt;td&gt;equal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg turns&lt;/td&gt;
&lt;td&gt;4.1&lt;/td&gt;
&lt;td&gt;2.2&lt;/td&gt;
&lt;td&gt;-46%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokens per run&lt;/td&gt;
&lt;td&gt;285,922&lt;/td&gt;
&lt;td&gt;187,603&lt;/td&gt;
&lt;td&gt;-34%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Same accuracy. 34% fewer tokens. 46% fewer turns.&lt;/p&gt;

&lt;p&gt;The savings scale with query complexity. On simple single-table filters, both arms performed about the same. On a complex multi-table join, the baseline agent spent 6+ turns querying &lt;code&gt;information_schema&lt;/code&gt; to discover the schema. dbdense answered in 2 turns, using 64% fewer tokens for that query.&lt;/p&gt;

&lt;p&gt;The two wrong answers (both on the same question, in both arms) returned identical incorrect results, pointing to question ambiguity rather than a schema context issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sidecar enrichment
&lt;/h2&gt;

&lt;p&gt;Databases lie by omission. A column named &lt;code&gt;status&lt;/code&gt; with type &lt;code&gt;text&lt;/code&gt; tells the LLM nothing about what values are valid. The agent either guesses or wastes a &lt;code&gt;SELECT DISTINCT&lt;/code&gt; turn to find out.&lt;/p&gt;

&lt;p&gt;dbdense supports a &lt;code&gt;dbdense.yaml&lt;/code&gt; sidecar file where you annotate columns with descriptions and enum values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;entities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;payments&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorized"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paid"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refunded"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Order&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;lifecycle&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;status."&lt;/span&gt;
        &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirmed"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shipped"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delivered"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cancelled"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These annotations merge into the compiled DDL as inline SQL comments. The LLM sees &lt;code&gt;-- Values: pending, authorized, paid, failed, refunded&lt;/code&gt; right next to the column definition. No extra queries needed.&lt;/p&gt;

&lt;p&gt;This also works for documenting JSONB structures, MongoDB embedded documents, or anything else the raw schema doesn't capture.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it doesn't do
&lt;/h2&gt;

&lt;p&gt;The snapshot is static. If your schema changes, re-run &lt;code&gt;export&lt;/code&gt;. This is intentional -- schemas are stable; questions change.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;slice&lt;/code&gt; tool still depends on the LLM picking the right tables from the lighthouse. dbdense reduces the context problem; it doesn't solve table selection for the model.&lt;/p&gt;

&lt;p&gt;It's not a &lt;code&gt;pg_dump --schema-only&lt;/code&gt; replacement. The renderer covers columns, PKs, FKs, NOT NULL, defaults, unique constraints, and indexes, but skips triggers, RLS policies, and custom types.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;install &lt;/span&gt;github.com/valkdb/dbdense/cmd/dbdense@latest
dbdense &lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="nt"&gt;--driver&lt;/span&gt; postgres &lt;span class="nt"&gt;--db&lt;/span&gt; &lt;span class="s2"&gt;"postgres://user:pass@localhost:5432/mydb"&lt;/span&gt; &lt;span class="nt"&gt;--schemas&lt;/span&gt; public
dbdense compile &lt;span class="nt"&gt;--mode&lt;/span&gt; lighthouse &lt;span class="nt"&gt;--in&lt;/span&gt; ctxexport.json &lt;span class="nt"&gt;--out&lt;/span&gt; lighthouse.txt
dbdense compile &lt;span class="nt"&gt;--in&lt;/span&gt; ctxexport.json &lt;span class="nt"&gt;--out&lt;/span&gt; schema.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You now have two files: a lighthouse map and full DDL. Point your agent at them. If you use Claude Code, &lt;code&gt;dbdense init-claude&lt;/code&gt; writes the MCP config for you.&lt;/p&gt;

&lt;p&gt;The project is open source at &lt;a href="https://github.com/valkdb/dbdense" rel="noopener noreferrer"&gt;github.com/valkdb/dbdense&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How does a linter know your column doesn't exist</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Mon, 09 Mar 2026 08:40:04 +0000</pubDate>
      <link>https://forem.com/eitamos_ring_0508146ca448/how-does-a-linter-know-your-column-doesnt-exist-7ff</link>
      <guid>https://forem.com/eitamos_ring_0508146ca448/how-does-a-linter-know-your-column-doesnt-exist-7ff</guid>
      <description>&lt;p&gt;You write a query that SELECTs ghost_status from the orders table. Your code compiles. Your tests pass. But ghost_status was never created in any migration. In production, that query crashes.&lt;br&gt;
Valk Guard catches this at PR time - with no database connection.&lt;br&gt;
This post walks through exactly how. Not hand-waving. The actual code path, from source file to finding.&lt;br&gt;
The setup&lt;br&gt;
Here's a Go file using Goqu to build a query:&lt;br&gt;
func ListBrokenUserOrderStatus(ctx context.Context) error {&lt;br&gt;
    _, _, err := goqu.From("users").&lt;br&gt;
        LeftJoin(&lt;br&gt;
            goqu.T("orders"),&lt;br&gt;
            goqu.On(goqu.I("orders.user_id").Eq(goqu.I("users.id"))),&lt;br&gt;
        ).&lt;br&gt;
        Select("users.id", "users.email", "orders.ghost_status").&lt;br&gt;
        Where(goqu.I("orders.missing_flag").Eq("pending")).&lt;br&gt;
        ToSQL()&lt;br&gt;
    return err&lt;br&gt;
}&lt;br&gt;
And here's the migration that created the orders table:&lt;br&gt;
CREATE TABLE orders (&lt;br&gt;
    id         SERIAL PRIMARY KEY,&lt;br&gt;
    user_id    INTEGER NOT NULL REFERENCES users(id),&lt;br&gt;
    total      NUMERIC(10,2) NOT NULL,&lt;br&gt;
    status     TEXT NOT NULL DEFAULT 'pending',&lt;br&gt;
    created_at TIMESTAMP DEFAULT now()&lt;br&gt;
);&lt;br&gt;
Notice: the query references orders.ghost_status. The migration never created that column. There is no ghost_status. Valk Guard reports:&lt;br&gt;
VG105: projection column "ghost_status" not found in table "orders" schema; check SELECT list and schema/model mappings&lt;br&gt;
How does it know?&lt;br&gt;
Let's walk through each phase.&lt;br&gt;
Phase 1: Query extraction&lt;br&gt;
The Goqu scanner doesn't look for SQL strings. It walks the Go AST looking for method chains rooted in goqu.From().&lt;br&gt;
When it finds one, it flattens the chain into a list of method calls: From("users") → LeftJoin(...) → Select(...) → Where(...). Each method gets parsed: From gives the base table, LeftJoin gives the join target, Select gives the projection columns, Where gives the predicates.&lt;br&gt;
From these parts, the scanner synthesizes a SQL statement:&lt;br&gt;
SELECT users.id, users.email, orders.ghost_status&lt;br&gt;
FROM users LEFT JOIN orders ON orders.user_id = users.id&lt;br&gt;
WHERE orders.missing_flag = 'pending'&lt;br&gt;
This SQL never existed in your source code. Valk Guard constructed it from the AST of your Go code. That's the key difference from regex-based tools - regex can't walk a method chain and reconstruct what the query builder will produce.&lt;br&gt;
Phase 2: Schema snapshot&lt;br&gt;
Separately, Valk Guard finds all .sql files under your migration paths. Each file gets parsed through postgresparser, and every DDL statement gets applied to a Snapshot - an in-memory representation of your schema's current state.&lt;br&gt;
The snapshot builder processes DDL actions in order:&lt;br&gt;
CREATE TABLE orders (id, user_id, total, status, created_at) → registers the table with five columns&lt;br&gt;
ALTER TABLE orders ADD COLUMN shipped_at TIMESTAMP → adds a sixth column&lt;br&gt;
ALTER TABLE orders DROP COLUMN shipped_at → removes it&lt;/p&gt;

&lt;p&gt;The end result is a map of table names to column definitions. For orders, that's: id, user_id, total, status, created_at. Five columns. No ghost_status.&lt;br&gt;
This is the same principle as running all your migrations on an empty database - except it happens in memory, with no database, in microseconds.&lt;br&gt;
Phase 3: Rule evaluation&lt;br&gt;
Now VG105 runs. It takes the synthesized SQL (already parsed into a structured IR by postgresparser) and the schema snapshot, and does a straightforward lookup:&lt;br&gt;
For each column in the SELECT list with usage type "projection", resolve which table it belongs to (using the alias or the single-table shortcut)&lt;br&gt;
Look up that table in the snapshot&lt;br&gt;
Check if the column exists in the table's column map&lt;br&gt;
If not → finding&lt;/p&gt;

&lt;p&gt;For ghost_status, the column usage says it belongs to orders (from the orders.ghost_status qualifier). The snapshot has an orders table. But orders.ghost_status is not in the column map. Finding.&lt;br&gt;
The same logic powers VG106 (unknown filter column - catches WHERE orders.missing_flag = 'pending' from the same query) and VG107 (unknown table reference).&lt;br&gt;
It also works with ORM models&lt;br&gt;
The same snapshot system powers schema-drift rules (VG101–VG104). Instead of checking queries against migrations, these rules check ORM models against migrations.&lt;br&gt;
Say you have a Go struct:&lt;br&gt;
type Order struct {&lt;br&gt;
    ID          int    &lt;code&gt;db:"id"&lt;/code&gt;&lt;br&gt;
    UserID      int    &lt;code&gt;db:"user_id"&lt;/code&gt;&lt;br&gt;
    Total       string &lt;code&gt;db:"total"&lt;/code&gt;&lt;br&gt;
    Status      string &lt;code&gt;db:"status"&lt;/code&gt;&lt;br&gt;
    GhostStatus string &lt;code&gt;db:"ghost_status"&lt;/code&gt;&lt;br&gt;
}&lt;br&gt;
Valk Guard's Go model extractor walks the AST, reads the db struct tags, and produces a ModelDef with columns: id, user_id, total, status, ghost_status.&lt;br&gt;
VG101 then compares each model column against the migration snapshot. ghost_status isn't in the orders table → finding:&lt;br&gt;
VG101: model "orders" references column "ghost_status" not found in table "orders" schema; check migration DDL or update model mapping&lt;br&gt;
Two different rules, two different input paths (query vs. model), same schema snapshot, same answer.&lt;br&gt;
What this means in practice&lt;br&gt;
You don't need a running database. You don't need to run migrations. You don't need to connect to staging. Valk Guard reads your source code and your migration files, builds everything in memory, and cross-references them statically.&lt;br&gt;
This runs in CI in seconds. It catches the kind of bug that usually shows up as a column "ghost_status" does not exist error in your logs at 2am - and moves it to a PR comment at 2pm instead.&lt;br&gt;
go install github.com/valkdb/valk-guard/cmd/valk-guard@latest&lt;br&gt;
valk-guard scan .&lt;br&gt;
Repo: github.com/ValkDB/valk-guard&lt;/p&gt;

</description>
      <category>ai</category>
      <category>postgres</category>
      <category>devops</category>
      <category>go</category>
    </item>
    <item>
      <title>We didn't want an AI SQL reviewer. We wanted deterministic</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Sat, 07 Mar 2026 16:33:20 +0000</pubDate>
      <link>https://forem.com/eitamos_ring_0508146ca448/we-didnt-want-an-ai-sql-reviewer-we-wanted-deterministic-oh6</link>
      <guid>https://forem.com/eitamos_ring_0508146ca448/we-didnt-want-an-ai-sql-reviewer-we-wanted-deterministic-oh6</guid>
      <description>&lt;p&gt;So we built &lt;a href="https://github.com/ValkDB/valk-guard" rel="noopener noreferrer"&gt;Valk Guard&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Most SQL linters scan &lt;code&gt;.sql&lt;/code&gt; files. The problem is, most SQL doesn't live in &lt;code&gt;.sql&lt;/code&gt; files.&lt;/p&gt;

&lt;p&gt;It lives in &lt;code&gt;db.Query()&lt;/code&gt; calls. In Goqu builder chains. In SQLAlchemy ORM methods. In migration files mixed with application logic. By the time SQL reaches production, it's been assembled, concatenated, or synthesized by code that no &lt;code&gt;.sql&lt;/code&gt;-only tool will ever see.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/ValkDB/valk-guard" rel="noopener noreferrer"&gt;Valk Guard&lt;/a&gt; to solve that. It's a static analysis tool that walks your source code's AST, reconstructs the SQL your ORMs and query builders will generate, parses it through a real PostgreSQL grammar, and reports findings in CI-friendly formats. No database connection. No runtime. Just structure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;install &lt;/span&gt;github.com/valkdb/valk-guard/cmd/valk-guard@latest
valk-guard scan &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;19 rules enabled by default. Zero config. Takes seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it actually catches
&lt;/h2&gt;

&lt;p&gt;Here's a Goqu chain in Go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;goqu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;From&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"orders"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Delete&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's no raw SQL anywhere in that line. But Valk Guard walks the Go AST, recognizes the Goqu method chain, synthesizes &lt;code&gt;DELETE FROM orders&lt;/code&gt;, feeds it through &lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;postgresparser&lt;/a&gt;, and fires &lt;strong&gt;VG003&lt;/strong&gt;: &lt;code&gt;DELETE without WHERE may affect all rows&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Same thing with SQLAlchemy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No SQL string. Valk Guard's embedded Python AST extractor reconstructs it, and the same rule fires.&lt;/p&gt;

&lt;p&gt;The full rule set covers three categories:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query safety&lt;/strong&gt; — &lt;code&gt;UPDATE&lt;/code&gt; without &lt;code&gt;WHERE&lt;/code&gt; (VG002), &lt;code&gt;DELETE&lt;/code&gt; without &lt;code&gt;WHERE&lt;/code&gt; (VG003), &lt;code&gt;SELECT *&lt;/code&gt; (VG001), unbounded &lt;code&gt;SELECT&lt;/code&gt; without &lt;code&gt;LIMIT&lt;/code&gt; (VG004), leading wildcard &lt;code&gt;LIKE '%...'&lt;/code&gt; (VG005), &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt; without &lt;code&gt;WHERE&lt;/code&gt; (VG006).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dangerous DDL&lt;/strong&gt; — &lt;code&gt;DROP TABLE&lt;/code&gt; / &lt;code&gt;TRUNCATE&lt;/code&gt; in application code (VG007), &lt;code&gt;CREATE INDEX&lt;/code&gt; without &lt;code&gt;CONCURRENTLY&lt;/code&gt; (VG008).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema drift&lt;/strong&gt; — ORM model references a column that migrations dropped (VG101). &lt;code&gt;NOT NULL&lt;/code&gt; column missing from model (VG102). Type mismatch between model and DDL (VG103). Model table has no &lt;code&gt;CREATE TABLE&lt;/code&gt; in migrations (VG104). Query &lt;code&gt;SELECT&lt;/code&gt;s a column that doesn't exist in the schema (VG105). And several more cross-reference checks between your code and your migrations.&lt;/p&gt;

&lt;p&gt;That last category is the one I haven't seen elsewhere. Valk Guard reads Go struct tags (&lt;code&gt;db&lt;/code&gt;, &lt;code&gt;gorm&lt;/code&gt;) and Python &lt;code&gt;__tablename__&lt;/code&gt; / &lt;code&gt;Column(...)&lt;/code&gt; definitions, builds a schema snapshot from your migration DDL, and cross-references them. If your ORM model says &lt;code&gt;email&lt;/code&gt; exists but your migration dropped it, that's VG101 at PR time — not a runtime panic in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AST, not AI
&lt;/h2&gt;

&lt;p&gt;This was a deliberate choice, and it's worth explaining.&lt;/p&gt;

&lt;p&gt;CI is not a brainstorming session. If a PR check comments on your code and changes its mind between runs, people stop trusting it. If it floods you with false positives, people add &lt;code&gt;--skip-lint&lt;/code&gt; and move on. The tool is dead even if the idea was good.&lt;/p&gt;

&lt;p&gt;I needed the opposite: same input, same output, every time. Testable. Explainable. Boring in the best way.&lt;/p&gt;

&lt;p&gt;AI is useful for exploration and suggestions. But a blocking CI step needs determinism. Even structured-output approaches for LLMs improve schema conformance — they don't make a generative model behave like a static analyzer. The questions Valk Guard answers are structural: "does this statement have a WHERE clause?" "does this builder chain produce a bounded query?" "does this model match this schema?" Those are AST questions, not generation questions.&lt;/p&gt;

&lt;p&gt;The same logic applies to regex. Regex is fine when the thing you're checking is a flat string. It falls apart when SQL is buried inside Go method chains or Python ORM calls. You can't regex your way through &lt;code&gt;goqu.From("users").Where(goqu.C("id").Eq(42)).Select("name")&lt;/code&gt; and reliably reconstruct the query. You need to parse the source language's AST, understand the builder pattern, and synthesize the SQL. That's what Valk Guard does.&lt;/p&gt;

&lt;p&gt;A small number of checks do use targeted regex after parsing — when a parser-extracted clause doesn't expose the exact field a rule needs. But that's regex as a surgical helper on already-parsed output, not regex as the foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipeline
&lt;/h2&gt;

&lt;p&gt;Source files go in. Findings come out. Here's what happens in between:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Extraction&lt;/strong&gt; — Four scanners run concurrently. The raw SQL scanner handles &lt;code&gt;.sql&lt;/code&gt; files with proper dollar-quoting and nested block comments. The Go scanner uses &lt;code&gt;go/ast&lt;/code&gt; to extract SQL from &lt;code&gt;db.Query&lt;/code&gt;, &lt;code&gt;db.Exec&lt;/code&gt;, and &lt;code&gt;db.QueryRow&lt;/code&gt;. The Goqu scanner walks builder chains and synthesizes SQL. The SQLAlchemy scanner invokes an embedded Python script (stdlib only — no pip dependencies) that parses ORM chains via Python's &lt;code&gt;ast&lt;/code&gt; module.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Parsing&lt;/strong&gt; — Every extracted statement goes through &lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;postgresparser&lt;/a&gt;, a pure-Go PostgreSQL parser I built on ANTLR. It produces a structured IR: tables, columns, joins, WHERE clauses, command type. No CGO, no database connection. Most queries parse in 70–350 µs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Rule evaluation&lt;/strong&gt; — Rules are dispatched by SQL command type for efficiency. Query rules (VG001–VG008) run against every parsed statement. Schema-drift rules (VG101+) cross-reference ORM model definitions against a migration-derived schema snapshot. Query-schema rules (VG105–VG108) validate that columns and tables referenced in queries actually exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Output&lt;/strong&gt; — Findings are deduplicated, sorted by file and line, and formatted as terminal output, JSON, SARIF (for GitHub Code Scanning), or rdjsonl (for reviewdog PR comments).&lt;/p&gt;

&lt;p&gt;The whole thing is ~8,100 lines of Go (plus ~700 lines of embedded Python), with nearly 1:1 test coverage. Three runtime dependencies: cobra, postgresparser, and yaml. That's it.&lt;/p&gt;

&lt;h2&gt;
  
  
  CI integration
&lt;/h2&gt;

&lt;p&gt;Valk Guard was designed for pull request workflows. Exit code 0 means clean, 1 means findings, 2 means config/parser error. Hook it into reviewdog and you get inline PR comments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run valk-guard&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;set +e&lt;/span&gt;
    &lt;span class="s"&gt;valk-guard scan . --format rdjsonl &amp;gt; valk-guard.rdjsonl&lt;/span&gt;
    &lt;span class="s"&gt;code=$?&lt;/span&gt;
    &lt;span class="s"&gt;set -e&lt;/span&gt;
    &lt;span class="s"&gt;if [ "$code" -gt 1 ]; then exit "$code"; fi&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Post review comments&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;REVIEWDOG_GITHUB_API_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;reviewdog -f=rdjsonl -name="valk-guard" \&lt;/span&gt;
      &lt;span class="s"&gt;-reporter=github-pr-review -filter-mode=added \&lt;/span&gt;
      &lt;span class="s"&gt;&amp;lt; valk-guard.rdjsonl&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Findings are non-blocking by default. Config errors fail the job. You can see real example PRs with live review comments in the &lt;a href="https://github.com/ValkDB/valk-guard-example" rel="noopener noreferrer"&gt;valk-guard-example&lt;/a&gt; repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it fits
&lt;/h2&gt;

&lt;p&gt;Valk Guard is not a runtime firewall, not a database advisor, and not a replacement for &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;. It's a guardrail for the most common and most expensive SQL mistakes — the ones that happen when someone pushes a &lt;code&gt;DELETE FROM orders&lt;/code&gt; without a &lt;code&gt;WHERE&lt;/code&gt; at 4pm on a Friday.&lt;/p&gt;

&lt;p&gt;It's PostgreSQL-only. It doesn't auto-fix. It doesn't need a running database. It reads your source code, understands your ORMs, and tells you what's going to break — before it merges.&lt;/p&gt;

&lt;p&gt;Less magic. More signal. Same answer every run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo: &lt;a href="https://github.com/ValkDB/valk-guard" rel="noopener noreferrer"&gt;github.com/ValkDB/valk-guard&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>Building a PostgreSQL Parser in Go: What Broke After We Open-Sourced It</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Wed, 18 Feb 2026 06:18:53 +0000</pubDate>
      <link>https://forem.com/eitamos_ring_0508146ca448/building-a-postgresql-parser-in-go-what-broke-after-we-open-sourced-it-3i8h</link>
      <guid>https://forem.com/eitamos_ring_0508146ca448/building-a-postgresql-parser-in-go-what-broke-after-we-open-sourced-it-3i8h</guid>
      <description>&lt;h1&gt;
  
  
  Building a PostgreSQL Parser in Go: What Broke After We Open-Sourced It
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;&lt;code&gt;postgresparser&lt;/code&gt;&lt;/a&gt; is a pure-Go PostgreSQL SQL parser. It turns SQL text into structured metadata (tables, columns, joins, filters, DDL actions, parameters) without executing queries.&lt;/p&gt;

&lt;p&gt;We thought it was solid. Open source proved we were wrong.&lt;/p&gt;

&lt;p&gt;Here is what open source forced us to learn.&lt;/p&gt;

&lt;p&gt;The biggest shift was not “more bug reports.” It was use-case expansion.&lt;br&gt;&lt;br&gt;
We built for our workflow. Users showed up with very different workloads.&lt;br&gt;
In the first week after release, most feedback centered on deterministic batch parsing.&lt;/p&gt;
&lt;h2&gt;
  
  
  Our internal assumptions broke immediately
&lt;/h2&gt;

&lt;p&gt;Inside a single team, ambiguous behavior survives because everyone “knows” the rules. Public users do not have that context.&lt;/p&gt;

&lt;p&gt;The first pressure point was multi-statement SQL. We had &lt;code&gt;ParseSQL&lt;/code&gt; (single statement) and figured batch parsing was “close enough.” It was not.&lt;/p&gt;

&lt;p&gt;People were using the parser for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CI linting pipelines&lt;/li&gt;
&lt;li&gt;production tools&lt;/li&gt;
&lt;li&gt;llm wrappers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;People asked practical questions we could not answer cleanly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which exact statement failed?&lt;/li&gt;
&lt;li&gt;Is this a warning or a hard failure?&lt;/li&gt;
&lt;li&gt;Can I map diagnostics to the original SQL text reliably?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those questions forced us to define strict contracts instead of relying on implied behavior.&lt;/p&gt;

&lt;p&gt;If your tool consumes SQL in bulk, batch correlation is everything.&lt;/p&gt;
&lt;h2&gt;
  
  
  Broken behavior example
&lt;/h2&gt;

&lt;p&gt;This input exposed the issue quickly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Early batch behavior made correlation awkward because results were compacted and diagnostics were not statement-first. If you’re building CI checks or migration tooling, “something in the batch failed” is not actionable.&lt;/p&gt;

&lt;p&gt;Now each statement has deterministic correlation (&lt;code&gt;Index&lt;/code&gt;, &lt;code&gt;RawSQL&lt;/code&gt;, &lt;code&gt;Query&lt;/code&gt;, &lt;code&gt;Warnings&lt;/code&gt;), so downstream code can point to the exact source statement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before/after API diff
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;- type ParseBatchResult struct {
-   Queries          []*ParsedQuery
-   Warnings         []ParseWarning
-   TotalStatements  int
-   ParsedStatements int
- }
&lt;/span&gt;&lt;span class="gi"&gt;+ type StatementParseResult struct {
+   Index    int
+   RawSQL   string
+   Query    *ParsedQuery   // nil =&amp;gt; IR conversion failure
+   Warnings []ParseWarning // statement-scoped warnings
+ }
+
+ type ParseBatchResult struct {
+   Statements       []StatementParseResult
+   TotalStatements  int
+   ParsedStatements int
+   HasFailures      bool
+ }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That shape is less convenient for quick demos, but much better for real integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real SQL in the wild is much uglier than test fixtures
&lt;/h2&gt;

&lt;p&gt;Open source usage also brought SQL shapes we did not have in internal tests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;trailing semicolons and odd whitespace&lt;/li&gt;
&lt;li&gt;invalid syntax in the middle of an otherwise valid batch&lt;/li&gt;
&lt;li&gt;mixed DDL + DML scripts&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ONLY&lt;/code&gt; variants in DDL paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The parser had to become resilient without becoming vague. That meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;better statement-level warning attribution&lt;/li&gt;
&lt;li&gt;explicit failure semantics (&lt;code&gt;Query == nil&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;tighter handling across DDL relation extraction paths&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  One concrete snippet (current behavior)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;postgresparser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseSQLAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"total=%d parsed=%d has_failures=%t&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TotalStatements&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParsedStatements&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HasFailures&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Statements&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"idx=%d failed=%t warnings=%d raw=%q&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Warnings&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RawSQL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the integration model people asked for: deterministic, inspectable, and boring in the best way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Open source removed our ability to hand-wave edge cases.&lt;/p&gt;

&lt;p&gt;The loop became:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;implement&lt;/li&gt;
&lt;li&gt;get challenged&lt;/li&gt;
&lt;li&gt;simplify&lt;/li&gt;
&lt;li&gt;lock behavior with tests&lt;/li&gt;
&lt;li&gt;document the contract&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That loop made &lt;code&gt;postgresparser&lt;/code&gt; better than it would have been as an internal-only tool.&lt;br&gt;
Internal tools can survive ambiguity. Public libraries cannot.&lt;/p&gt;

&lt;p&gt;If you're building something on top of &lt;code&gt;postgresparser&lt;/code&gt;, open an issue. Real-world SQL keeps improving the contract.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>go</category>
      <category>database</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building a Pure Go PostgreSQL SQL Parser (No CGO, No Server, No Runtime Dependencies)</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Mon, 09 Feb 2026 17:29:25 +0000</pubDate>
      <link>https://forem.com/eitamos_ring_0508146ca448/building-a-pure-go-postgresql-sql-parser-no-cgo-no-server-no-runtime-dependencies-goi</link>
      <guid>https://forem.com/eitamos_ring_0508146ca448/building-a-pure-go-postgresql-sql-parser-no-cgo-no-server-no-runtime-dependencies-goi</guid>
      <description>&lt;h2&gt;
  
  
  Why we built this
&lt;/h2&gt;

&lt;p&gt;We needed PostgreSQL SQL parsing in environments where CGO was not an option:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alpine containers
&lt;/li&gt;
&lt;li&gt;AWS Lambda
&lt;/li&gt;
&lt;li&gt;Distroless images
&lt;/li&gt;
&lt;li&gt;Scratch builds
&lt;/li&gt;
&lt;li&gt;ARM deployments
&lt;/li&gt;
&lt;li&gt;Anywhere &lt;code&gt;CGO_ENABLED=0&lt;/code&gt; is required
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most existing approaches either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Depend on native Postgres parser bindings
&lt;/li&gt;
&lt;li&gt;Require CGO
&lt;/li&gt;
&lt;li&gt;Require running a Postgres server
&lt;/li&gt;
&lt;li&gt;Are too heavy for infrastructure tooling
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So we built a pure Go PostgreSQL parser.&lt;/p&gt;




&lt;h2&gt;
  
  
  The goal
&lt;/h2&gt;

&lt;p&gt;Not to replace Postgres parsing.&lt;br&gt;&lt;br&gt;
Not to be 100% server-compatible.&lt;/p&gt;

&lt;p&gt;The goal was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Give infrastructure and tooling systems structured query data safely and deterministically.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What it extracts
&lt;/h2&gt;

&lt;p&gt;The parser outputs an intermediate representation (IR) with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tables (with aliases)
&lt;/li&gt;
&lt;li&gt;Columns
&lt;/li&gt;
&lt;li&gt;Joins
&lt;/li&gt;
&lt;li&gt;WHERE filters
&lt;/li&gt;
&lt;li&gt;GROUP BY
&lt;/li&gt;
&lt;li&gt;ORDER BY
&lt;/li&gt;
&lt;li&gt;CTEs
&lt;/li&gt;
&lt;li&gt;Subqueries
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;postgresparser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseSQL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;`
    SELECT u.name, COUNT(o.id) AS order_count
    FROM users u
    LEFT JOIN orders o ON o.user_id = u.id
    WHERE u.active = true
    GROUP BY u.name
    ORDER BY order_count DESC
`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c"&gt;// "SELECT"&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tables&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c"&gt;// users, orders with aliases&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c"&gt;// u.name, COUNT(o.id) AS order_count&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="c"&gt;// ["u.active=true"]&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JoinConditions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// ["o.user_id=u.id"]&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GroupBy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c"&gt;// ["u.name"]&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ColumnUsage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c"&gt;// each column with its role: filter, join, projection, group, order&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now tooling can answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What tables does this query touch?
&lt;/li&gt;
&lt;li&gt;What joins exist?
&lt;/li&gt;
&lt;li&gt;What filters are applied?
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why ANTLR + Pure Go
&lt;/h2&gt;

&lt;p&gt;We evaluated:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;libpg_query bindings
&lt;/li&gt;
&lt;li&gt;WASM approaches
&lt;/li&gt;
&lt;li&gt;regex / string parsing
&lt;/li&gt;
&lt;li&gt;custom parsers
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tradeoffs we cared about
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pure Go&lt;/td&gt;
&lt;td&gt;Simpler deploy, fewer runtime risks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No CGO&lt;/td&gt;
&lt;td&gt;Works in restricted environments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic behavior&lt;/td&gt;
&lt;td&gt;Important for tooling / analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;Needed for production workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;ANTLR gave us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mature grammar ecosystem
&lt;/li&gt;
&lt;li&gt;Strong parsing guarantees
&lt;/li&gt;
&lt;li&gt;Good performance with SLL mode
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;Most real-world queries parse in roughly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;~70–350 microseconds&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
(using SLL prediction mode)&lt;/p&gt;




&lt;h2&gt;
  
  
  Where this is useful
&lt;/h2&gt;

&lt;p&gt;Typical use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CI SQL validation
&lt;/li&gt;
&lt;li&gt;Query lineage hints
&lt;/li&gt;
&lt;li&gt;Migration safety checks
&lt;/li&gt;
&lt;li&gt;Static query analysis before deploy
&lt;/li&gt;
&lt;li&gt;“What tables does this service touch?” automation
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Open Source
&lt;/h2&gt;

&lt;p&gt;We’ve been using this internally for months and decided to open source it.&lt;/p&gt;

&lt;p&gt;If you break it with weird SQL, please open issues — that’s how coverage improves.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;https://github.com/ValkDB/postgresparser&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>go</category>
      <category>opensource</category>
      <category>programming</category>
    </item>
    <item>
      <title>Why Database Indexes Keep Coming Up in My Performance Work</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Tue, 29 Jul 2025 06:44:05 +0000</pubDate>
      <link>https://forem.com/eitamos_ring_0508146ca448/why-database-indexes-keep-coming-up-in-my-performance-work-2imo</link>
      <guid>https://forem.com/eitamos_ring_0508146ca448/why-database-indexes-keep-coming-up-in-my-performance-work-2imo</guid>
      <description>&lt;p&gt;I bounce between data pipelines, API fires, and new features all week, and there’s this one thing that keeps biting us. Slow pages. And 8 times out of 10 it’s the same root cause: we forgot the right index.&lt;/p&gt;

&lt;p&gt;We had this analytics dashboard—we tuned the React, cached the API, CDN was spotless. Still slow. The query behind it? joining big tables and scanning like theres no tomorrow. No index on the join keys. Oops.&lt;/p&gt;

&lt;p&gt;A quick demo to prove I’m not just ranting&lt;br&gt;
I spun up a tiny test on my dev box: made an orders table, loaded 100k rows, timed a few queries. First with no indexes, then with some obvious ones.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;CREATE TABLE orders (&lt;br&gt;
  id           SERIAL PRIMARY KEY,&lt;br&gt;
  customer_id  INTEGER NOT NULL,&lt;br&gt;
  order_date   TIMESTAMP NOT NULL,&lt;br&gt;
  total_amount DECIMAL(10,2) NOT NULL,&lt;br&gt;
  status       VARCHAR(50) NOT NULL,&lt;br&gt;
  country      VARCHAR(100)&lt;br&gt;
);&lt;/code&gt;&lt;br&gt;
Before indexes (avg):&lt;/p&gt;

&lt;p&gt;Customer lookup: 6.11 ms&lt;/p&gt;

&lt;p&gt;Status filter: 8.47 ms&lt;/p&gt;

&lt;p&gt;Date range: 6.73 ms&lt;/p&gt;

&lt;p&gt;After indexes:&lt;/p&gt;

&lt;p&gt;Customer lookup: 0.88 ms (~7x faster)&lt;/p&gt;

&lt;p&gt;Status filter: 2.41 ms (~3.5x)&lt;/p&gt;

&lt;p&gt;Date range: 1.48 ms (~4.5x)&lt;/p&gt;

&lt;p&gt;Note: these are from my laptop which is also running Docker, two IDEs, Slack, and that Electron app we dont talk about… so, not a lab.&lt;/p&gt;

&lt;p&gt;The classic slow page (you’ve seen this movie)&lt;br&gt;
Admin page loads in 10 seconds, everyones pointing fingers. Frontend swears it’s fine, backend says “works on my machine”. The DB? doing full table scans through millions of rows because the where clause is on status and order_date and, yeah, neither is indexed.&lt;/p&gt;

&lt;p&gt;Usual suspects&lt;/p&gt;

&lt;p&gt;Foreign keys without matching indexes on the child table&lt;/p&gt;

&lt;p&gt;Date columns everybody filters by (no index)&lt;/p&gt;

&lt;p&gt;Status fields in every WHERE (also no index)&lt;/p&gt;

&lt;p&gt;The tiny bit of code that tells the truth&lt;br&gt;
Here’s how I timed the “customer orders” lookup in Go:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;func testCustomerIDQuery(db *sql.DB, description string) {&lt;br&gt;
    var total time.Duration&lt;br&gt;
    for i := 0; i &amp;lt; numQueries; i++ {&lt;br&gt;
        id := rand.Intn(100000) + 1&lt;br&gt;
        start := time.Now()&lt;br&gt;
        rows, _ := db.Query("&lt;br&gt;
            SELECT id, customer_id, order_date, total_amount, status, country&lt;br&gt;
            FROM orders WHERE customer_id = $1&lt;br&gt;
            LIMIT 10", id)&lt;br&gt;
        if rows != nil { rows.Close() }&lt;br&gt;
        total += time.Since(start)&lt;br&gt;
    }&lt;br&gt;
    fmt.Printf("%s avg: %v\n", description, total/time.Duration(numQueries))&lt;br&gt;
}&lt;/code&gt;&lt;br&gt;
And the “magic” is not magic, it’s just this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
&lt;/code&gt;CREATE INDEX idx_customer_id   ON orders(customer_id);&lt;br&gt;
CREATE INDEX idx_status        ON orders(status);&lt;br&gt;
CREATE INDEX idx_total_amount  ON orders(total_amount);&lt;br&gt;
CREATE INDEX idx_order_date    ON orders(order_date);&lt;br&gt;
CREATE INDEX idx_country       ON orders(country);&lt;br&gt;
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);``&lt;br&gt;
Add those, re‑run the exact same queries, and your 10‑second page quietly becomes sub‑second. It’s almost embarrassing how often that’s the fix.&lt;/p&gt;

&lt;p&gt;Index size reality check&lt;br&gt;
People ask “wont indexes be huge?”. From the same test (100k rows):&lt;/p&gt;

&lt;p&gt;idx_customer_date: 3.1 MB&lt;/p&gt;

&lt;p&gt;orders_pkey: 2.2 MB&lt;/p&gt;

&lt;p&gt;idx_order_date: 2.2 MB&lt;/p&gt;

&lt;p&gt;idx_total_amount: 2.2 MB&lt;/p&gt;

&lt;p&gt;idx_customer_id: 1.9 MB&lt;/p&gt;

&lt;p&gt;idx_country: 712 KB&lt;/p&gt;

&lt;p&gt;idx_status: 688 KB&lt;/p&gt;

&lt;p&gt;Call it ~12 MB total. For the speedup you get, thats cheap.&lt;/p&gt;

&lt;p&gt;A couple gotchas (learned the hard way)&lt;br&gt;
Composite order matters. (customer_id, order_date) helps WHERE customer_id = ? ORDER BY order_date DESC LIMIT 10. Flip it and you’ll be sad.&lt;/p&gt;

&lt;p&gt;Check the plan. EXPLAIN (ANALYZE, BUFFERS)—you want Index Scan / Index Only Scan, not Seq Scan.&lt;/p&gt;

&lt;p&gt;Not every column deserves an index. Super low selectivity (like a boolean) usually wont help with a plain b‑tree.&lt;/p&gt;

&lt;p&gt;Writes pay the bill. Indexes speed reads, but inserts/updates get a bit slower—so pick the ones you actually use.&lt;/p&gt;

&lt;p&gt;The boring checklist that works&lt;br&gt;
Before you reach for sharding, a rewrite, or a shiny new DB:&lt;/p&gt;

&lt;p&gt;Profile the slow endpoint.&lt;/p&gt;

&lt;p&gt;EXPLAIN (ANALYZE, BUFFERS) the worst query.&lt;/p&gt;

&lt;p&gt;If it’s scanning a big table, add the smallest useful index.&lt;/p&gt;

&lt;p&gt;Re‑test. Ship. Sleep.&lt;/p&gt;

&lt;p&gt;It’s not flashy. It won’t wow anyone at a meetup. But it’ll make your app feel fast, which is what users care about anyway.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
