<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Carmelo Riolo</title>
    <description>The latest articles on Forem by Carmelo Riolo (@rioloc).</description>
    <link>https://forem.com/rioloc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3656232%2F00c481e3-a5f0-4cbc-918b-a26370a96b18.jpg</url>
      <title>Forem: Carmelo Riolo</title>
      <link>https://forem.com/rioloc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rioloc"/>
    <language>en</language>
    <item>
      <title>The Wrong Database Connection: A Go Deadlock Story</title>
      <dc:creator>Carmelo Riolo</dc:creator>
      <pubDate>Wed, 10 Dec 2025 22:06:47 +0000</pubDate>
      <link>https://forem.com/rioloc/the-wrong-database-connection-a-go-deadlock-story-286e</link>
      <guid>https://forem.com/rioloc/the-wrong-database-connection-a-go-deadlock-story-286e</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR: Using the wrong connection within a limited connection pool leads to deadlock when concurrent executions exhaust all available connections.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine you're a software engineer (and you probably are, considering you're reading this). One day you log in at work, check your workload status, your Grafana dashboard, or if you were diligent enough to create proper alerting, you get paged via PagerDuty, only to discover your application has been stuck for minutes, hours, or even days.&lt;/p&gt;

&lt;p&gt;Not dead, just stuck in an idle state, unresponsive to any events. Or even worse, some liveness/readiness probes start failing apparently at random, causing restarts of your service and leaving you with few insights to debug. And in the case no proper alerting or monitoring is set, you won't easily detect this.&lt;/p&gt;

&lt;p&gt;Unfortunately, I saw this happen in the past and the issue was related to a very subtle problem: &lt;strong&gt;wrong use of database connections and transactions within the same connection pool&lt;/strong&gt;, causing a terrible &lt;strong&gt;deadlock&lt;/strong&gt; at the application level.&lt;/p&gt;

&lt;p&gt;Let me explain this clearly, with the hope it will save your day in the future. I'll provide examples in Go since it's the language I'm most familiar with, but the concept applies to other languages as well.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;Suppose you have a backend application that relies on a PostgreSQL database. Here's the simplest way to connect to a database and set up a connection pool in Go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Create DB Config&lt;/span&gt;
&lt;span class="n"&gt;connString&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="s"&gt;"host=%s port=%d user=%s password=%s dbname=%s sslmode=disable"&lt;/span&gt;
&lt;span class="n"&gt;databaseUrl&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connString&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dbname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"postgres"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;databaseUrl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ping&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Setup connection pool size&lt;/span&gt;
&lt;span class="c"&gt;// 4 is an arbitrary number for this example&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetMaxOpenConns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's focus on the &lt;code&gt;db.SetMaxOpenConns(N)&lt;/code&gt; method. According to the documentation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// SetMaxOpenConns sets the maximum number of open connections to the database.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means you can have at most &lt;strong&gt;N&lt;/strong&gt; open connections from your process to the database (4 in this example). When the maximum number of connections is reached, goroutines will wait until an existing connection is released. Once that happens, the connection is acquired to perform whatever operation is needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The code that looks fine
&lt;/h2&gt;

&lt;p&gt;Let's expand our example by adding concurrent workers that use the connection pool to perform transactional operations against the database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;numberOfWorkers&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;        &lt;span class="c"&gt;// case numberOfWorkers &amp;lt; size of connection pool&lt;/span&gt;
&lt;span class="c"&gt;// numberOfWorkers := 4     // case numberOfWorkers == size of connection pool&lt;/span&gt;
&lt;span class="c"&gt;// numberOfWorkers := 10    // case numberOfWorkers &amp;gt; size of connection pool&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;numberOfWorkers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Running worker %d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BeginTx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TxOptions&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"worker %d failed to create tx&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Rollback&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"INSERT INTO recipes (id, name, description, created_at) VALUES ($1, $2, $3, $4)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Pizza %d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"Just a pizza"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"worker %d failed query&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"worker %d failed committing tx&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Some of you may have already spotted something wrong with this code, but in production codebases with layers of wrappers and nested methods, such issues aren't always so clear and evident. Let's continue and see what happens when we run this code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Executing the code
&lt;/h2&gt;

&lt;p&gt;When we run our code with a number of workers &lt;strong&gt;less than&lt;/strong&gt; the connection pool size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025/08/08 11:53:32 Successfully connected to the db
2025/08/08 11:53:32 Running worker 1
2025/08/08 11:53:32 Running worker 0
2025/08/08 11:53:37 worker 1 ended
2025/08/08 11:53:37 worker 0 ended
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything works fine. Now let's increase the number of workers to 4 (&lt;strong&gt;equal to&lt;/strong&gt; the connection pool size):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025/08/08 11:59:10 Successfully connected to the db
2025/08/08 11:59:10 Running worker 3
2025/08/08 11:59:10 Running worker 0
2025/08/08 11:59:10 Running worker 1
2025/08/08 11:59:10 Running worker 2
2025/08/08 11:59:15 worker 2 ended
2025/08/08 11:59:15 worker 0 ended
2025/08/08 11:59:15 worker 1 ended
2025/08/08 11:59:15 worker 3 ended
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Still working fine. Now let's increase the number of workers to &lt;strong&gt;exceed&lt;/strong&gt; the connection pool size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025/08/08 12:00:44 Successfully connected to the db
2025/08/08 12:00:44 Running worker 9
2025/08/08 12:00:44 Running worker 3
2025/08/08 12:00:44 Running worker 7
2025/08/08 12:00:44 Running worker 4
2025/08/08 12:00:44 Running worker 0
2025/08/08 12:00:44 Running worker 2
2025/08/08 12:00:44 Running worker 8
2025/08/08 12:00:44 Running worker 5
2025/08/08 12:00:44 Running worker 6
2025/08/08 12:00:44 Running worker 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No workers ended this time—the application entered a &lt;strong&gt;deadlock state&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Investigating the issue
&lt;/h2&gt;

&lt;p&gt;At this point, what should be the next step to get more insights? In my case, it was using a &lt;strong&gt;profiler&lt;/strong&gt;. You can achieve this in Go by instrumenting your application with &lt;a href="https://pkg.go.dev/net/http/pprof" rel="noopener noreferrer"&gt;pprof&lt;/a&gt;. One of the simplest ways to use it is by exposing a web server that serves runtime profiling data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListenAndServe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"localhost:6060"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One interesting thing you can get from pprof, besides CPU and memory profiles, is the &lt;strong&gt;full goroutine stack dump&lt;/strong&gt; by accessing &lt;code&gt;http://localhost:6060/debug/pprof/goroutine?debug=2&lt;/code&gt;. This gives you something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;goroutine 28 [select]:
database/sql.(*DB).conn(0xc000111450, {0x866310, 0xb10e20}, 0x1)
    /usr/local/go/src/database/sql/sql.go:1369 +0x425
database/sql.(*DB).exec(0xc000111450, {0x866310, 0xb10e20}, {0x7f09d8, 0x4f}, {0xc000075f10, 0x4, 0x4}, 0xbe?)
    /usr/local/go/src/database/sql/sql.go:1689 +0x54
// ... more stack trace

goroutine 33 [chan receive]:
database/sql.(*Tx).awaitDone(0xc00025e000)
    /usr/local/go/src/database/sql/sql.go:2212 +0x29
created by database/sql.(*DB).beginDC in goroutine 28

goroutine 51 [chan receive]:
database/sql.(*Tx).awaitDone(0xc0000b0100)
    /usr/local/go/src/database/sql/sql.go:2212 +0x29
created by database/sql.(*DB).beginDC in goroutine 25
// ... omitting the full dump for readability
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By inspecting the dump more carefully, we can see the evidence of the problem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;goroutine 19 [select]: database/sql.(*DB).conn() // Waiting for connection
goroutine 20 [select]: database/sql.(*DB).conn() // Waiting for connection  
goroutine 21 [select]: database/sql.(*DB).conn() // Waiting for connection
// ... and so on
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;select&lt;/code&gt; statement is a control structure that lets a goroutine wait on multiple communication operations. Meanwhile, other goroutines are holding active transactions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;goroutine 33 [chan receive]: database/sql.(*Tx).awaitDone() // Active transaction
goroutine 51 [chan receive]: database/sql.(*Tx).awaitDone() // Active transaction
// etc.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;awaitDone()&lt;/code&gt; goroutines are transaction monitors that wait for the transaction to be committed, rolled back, or canceled—they're doing their job correctly.&lt;/p&gt;

&lt;p&gt;What we have is a &lt;strong&gt;resource deadlock&lt;/strong&gt; where all available database connections are held by transactions that aren't progressing, while other goroutines indefinitely wait for those same resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Root Cause
&lt;/h2&gt;

&lt;p&gt;Let's examine our worker code again, focusing on this critical part:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BeginTx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TxOptions&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"worker %d failed to create tx&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Rollback&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c"&gt;// THE BUG IS HERE ↓&lt;/span&gt;
&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"INSERT INTO recipes (id, name, description, created_at) VALUES ($1, $2, $3, $4)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Pizza %d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"Just a pizza"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"worker %d failed query&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Beginning a transaction, which acquires a connection from the pool&lt;/li&gt;
&lt;li&gt;Using the &lt;code&gt;db&lt;/code&gt; client to execute a query, which tries to acquire &lt;strong&gt;another&lt;/strong&gt; connection&lt;/li&gt;
&lt;li&gt;Committing or rolling back based on the operation status&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The problem is that using the &lt;code&gt;db&lt;/code&gt; client after creating a transaction results in &lt;strong&gt;double connection usage&lt;/strong&gt;. Here's exactly how the deadlock occurs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Worker 0 begins a transaction → acquires connection 1 (Pool: 1/4 used)&lt;/li&gt;
&lt;li&gt;Worker 1 begins a transaction → acquires connection 2 (Pool: 2/4 used)
&lt;/li&gt;
&lt;li&gt;Worker 2 begins a transaction → acquires connection 3 (Pool: 3/4 used)&lt;/li&gt;
&lt;li&gt;Worker 3 begins a transaction → acquires connection 4 (Pool: 4/4 used)&lt;/li&gt;
&lt;li&gt;Worker 0 calls &lt;code&gt;db.Exec()&lt;/code&gt; → tries to acquire connection 5, but pool is exhausted&lt;/li&gt;
&lt;li&gt;Worker 1 calls &lt;code&gt;db.Exec()&lt;/code&gt; → tries to acquire connection 6, but pool is exhausted
&lt;/li&gt;
&lt;li&gt;Worker 2 calls &lt;code&gt;db.Exec()&lt;/code&gt; → tries to acquire connection 7, but pool is exhausted&lt;/li&gt;
&lt;li&gt;Worker 3 calls &lt;code&gt;db.Exec()&lt;/code&gt; → tries to acquire connection 8, but pool is exhausted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deadlock!&lt;/strong&gt; Everyone is waiting for connections that will never be released.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Fix
&lt;/h2&gt;

&lt;p&gt;The issue causing all this trouble is the wrong use of &lt;code&gt;db.Exec()&lt;/code&gt; instead of &lt;code&gt;tx.Exec()&lt;/code&gt;. The correct way is to use the transaction handle, which uses the same connection that the transaction already holds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BeginTx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TxOptions&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"worker %d failed to create tx&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Rollback&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c"&gt;// FIXED: Use tx.Exec() instead of db.Exec()&lt;/span&gt;
&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"INSERT INTO recipes (id, name, description, created_at) VALUES ($1, $2, $3, $4)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Pizza %d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"Just a pizza"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"worker %d failed query&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's remarkable how two characters (&lt;code&gt;db&lt;/code&gt; vs &lt;code&gt;tx&lt;/code&gt;) can halt your entire production system. While this may seem simple to spot in this example, in large production codebases it can be much harder to detect. In my case, it was randomly affecting one pod in our Kubernetes deployment at unpredictable times, especially during high load and concurrency spikes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention Strategies
&lt;/h2&gt;

&lt;p&gt;How can you avoid this? I see several alternatives:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Proper Concurrency Testing&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Test your application under realistic concurrent load to spot such issues before production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Code Structure Design&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Structure your code so you can't accidentally open new connections within a transactional block.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Use pgxpool for Better Control&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For Go applications, consider using &lt;a href="https://pkg.go.dev/github.com/jackc/pgx/v4/pgxpool" rel="noopener noreferrer"&gt;pgxpool&lt;/a&gt; which provides more granular control over the connection pool. The explicit &lt;code&gt;Acquire()&lt;/code&gt;/&lt;code&gt;Release()&lt;/code&gt; pattern makes it much clearer when you're using connections:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Acquire&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Release&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BeginTx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pgx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TxOptions&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
&lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Rollback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// It's much harder to accidentally use pool.Exec() here&lt;/span&gt;
&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"INSERT INTO recipes ..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Commit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Monitoring and Alerting&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
I intentionally avoid mentioning palliative measures like connection timeouts, which may mask the underlying issue while causing performance degradation. Instead, implement proper liveness/readiness probes paired with alerts on restart frequency. This provides a good tradeoff between keeping the system running and being notified when something isn't behaving correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key lessons
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Test under realistic concurrent load&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Be consistent in resource usage&lt;/strong&gt; (always be aware on how connections usage happens)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use the right tool, like pprof, to support your debugging against mysterious hangs&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Source: &lt;a href="https://carmeloriolo.com/posts/the-wrong-database-connection/" rel="noopener noreferrer"&gt;https://carmeloriolo.com/posts/the-wrong-database-connection/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>go</category>
      <category>database</category>
    </item>
  </channel>
</rss>
