<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sasi Kumar T</title>
    <description>The latest articles on Forem by Sasi Kumar T (@sasikumart).</description>
    <link>https://forem.com/sasikumart</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F978974%2F6a851114-1acf-45c2-8190-780987312174.jpg</url>
      <title>Forem: Sasi Kumar T</title>
      <link>https://forem.com/sasikumart</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sasikumart"/>
    <language>en</language>
    <item>
      <title>Tracing the Express Middleware Nobody Talks About: Compression</title>
      <dc:creator>Sasi Kumar T</dc:creator>
      <pubDate>Thu, 05 Mar 2026 15:19:12 +0000</pubDate>
      <link>https://forem.com/sasikumart/tracing-the-express-middleware-nobody-talks-about-compression-5g7b</link>
      <guid>https://forem.com/sasikumart/tracing-the-express-middleware-nobody-talks-about-compression-5g7b</guid>
      <description>&lt;p&gt;When we talk about &lt;strong&gt;observability in Node.js&lt;/strong&gt;, we usually trace things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP requests&lt;/li&gt;
&lt;li&gt;Database queries&lt;/li&gt;
&lt;li&gt;External API calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But there’s a &lt;strong&gt;hidden performance layer most tracing setups completely ignore&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Response compression&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Many Express applications use the popular &lt;code&gt;compression&lt;/code&gt; middleware to reduce response size and improve latency.&lt;/p&gt;

&lt;p&gt;But here’s the problem.&lt;br&gt;
&lt;strong&gt;Nobody traces it.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Compression Deserves Tracing
&lt;/h2&gt;

&lt;p&gt;Compression happens &lt;strong&gt;after your application logic finishes&lt;/strong&gt; but &lt;strong&gt;before the response is sent to the client&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That means:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP Request
     │
     ▼
Express Route
     │
     ▼
Application Logic
     │
     ▼
Compression (zlib)
     │
     ▼
Response Sent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If compression is slow, you may see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;slower response times&lt;/li&gt;
&lt;li&gt;increased CPU usage&lt;/li&gt;
&lt;li&gt;delayed response delivery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But in most observability setups, &lt;strong&gt;this time is invisible&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Your traces might show something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP Request
  └── Route Handler
        └── DB Query
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But you &lt;strong&gt;never see compression time&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That’s exactly the gap I wanted to close.&lt;/p&gt;




&lt;h2&gt;
  
  
  Introducing Compression Instrumentation
&lt;/h2&gt;

&lt;p&gt;I built a small OpenTelemetry instrumentation package to trace compression operations.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://www.npmjs.com/package/@sasikumart/compression-instrumentation" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/@sasikumart/compression-instrumentation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This package instruments &lt;strong&gt;Node.js zlib compression calls&lt;/strong&gt; and exposes them as OpenTelemetry spans.&lt;/p&gt;

&lt;p&gt;It allows you to observe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;compression duration&lt;/li&gt;
&lt;li&gt;compression algorithm used&lt;/li&gt;
&lt;li&gt;impact of compression on request latency&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What the Trace Looks Like
&lt;/h2&gt;

&lt;p&gt;Without compression instrumentation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP GET /users
  └── controller.getUsers
        └── mongodb query
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With instrumentation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP GET /users
  └── controller.getUsers
        └── mongodb query
  └── gzip compression
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now compression becomes a &lt;strong&gt;first-class span in your trace&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Installing the Instrumentation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @sasikumart/compression-instrumentation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Basic Setup
&lt;/h2&gt;

&lt;p&gt;Register the instrumentation with OpenTelemetry.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;NodeSDK&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/sdk-node&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ZlibInstrumentation&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@sasikumart/compression-instrumentation&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sdk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;NodeSDK&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;instrumentations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ZlibInstrumentation&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;sdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once enabled, any &lt;strong&gt;gzip compression performed via Node.js zlib&lt;/strong&gt; will automatically generate spans.&lt;/p&gt;




&lt;h2&gt;
  
  
  Example Express Application
&lt;/h2&gt;

&lt;p&gt;Now let’s use Express with the compression middleware.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;compression&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;compression&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;compression&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/data&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;OpenTelemetry is awesome&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Server running on port 3000&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a request is served, compression will trigger a &lt;strong&gt;zlib gzip operation&lt;/strong&gt;, which will now appear in your traces.&lt;/p&gt;




&lt;h2&gt;
  
  
  Jaeger Trace Example
&lt;/h2&gt;

&lt;p&gt;Here’s an example trace captured in &lt;strong&gt;Jaeger&lt;/strong&gt; where the gzip compression span appears alongside the HTTP request span.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Futvchcqjofunzoig53vn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Futvchcqjofunzoig53vn.png" alt="Compression Span in Jaeger" width="800" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The trace typically shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP request span&lt;/li&gt;
&lt;li&gt;route handler span&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;gzip compression span generated by the instrumentation&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Gets Captured
&lt;/h2&gt;

&lt;p&gt;Each compression operation becomes its own span, allowing you to observe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how long compression takes&lt;/li&gt;
&lt;li&gt;how much latency compression adds to the request&lt;/li&gt;
&lt;li&gt;whether compression becomes a CPU bottleneck&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Current Limitation
&lt;/h2&gt;

&lt;p&gt;At the moment, the instrumentation &lt;strong&gt;only captures gzip compression&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is because most Express applications using the &lt;code&gt;compression&lt;/code&gt; middleware default to &lt;strong&gt;gzip via Node's zlib module&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Future versions may add support for additional algorithms such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deflate&lt;/li&gt;
&lt;li&gt;brotli&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Compression is one of those &lt;strong&gt;performance optimizations everyone enables but rarely measures&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Observability should show &lt;strong&gt;everything that affects latency&lt;/strong&gt;, including middleware and runtime operations.&lt;/p&gt;

&lt;p&gt;Tracing compression helps you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;identify CPU bottlenecks&lt;/li&gt;
&lt;li&gt;understand hidden latency in responses&lt;/li&gt;
&lt;li&gt;gain deeper insight into the full request lifecycle&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Real Use Cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;High traffic APIs: Compression CPU overhead can become significant under heavy load.&lt;/li&gt;
&lt;li&gt;Large JSON responses: You can verify whether compression is actually helping reduce payload sizes.&lt;/li&gt;
&lt;li&gt;Performance tuning: You can determine whether compression should happen at the &lt;strong&gt;application layer or a reverse proxy&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Observability often focuses on &lt;strong&gt;business logic&lt;/strong&gt;, but real latency comes from &lt;strong&gt;every layer of the request lifecycle&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Compression is one of those layers that has remained invisible for too long.&lt;/p&gt;

&lt;p&gt;Now it doesn’t have to be.&lt;/p&gt;




&lt;p&gt;If you're interested in Node.js observability and OpenTelemetry, you might also like my previous deep dive:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://dev.to/sasikumart/tracing-the-part-of-mongoose-nobody-talks-about-3gj2"&gt;https://dev.to/sasikumart/tracing-the-part-of-mongoose-nobody-talks-about-3gj2&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  About Me
&lt;/h2&gt;

&lt;p&gt;I’m a backend developer with extensive experience in designing and optimizing scalable backend systems. My expertise includes tackling complex performance challenges. I’ve led numerous database performance initiatives and have also been deeply involved in system design and revamping existing systems. My focus is on enhancing efficiency, ensuring reliability, and delivering robust solutions that scale effectively.&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on &lt;a href="https://www.linkedin.com/in/sasi-kumar-thangavel/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; to learn more about my professional journey and projects.&lt;/p&gt;

</description>
      <category>node</category>
      <category>opentelemetry</category>
      <category>observability</category>
      <category>express</category>
    </item>
    <item>
      <title>Tracing the part of Mongoose nobody talks about</title>
      <dc:creator>Sasi Kumar T</dc:creator>
      <pubDate>Fri, 27 Feb 2026 10:11:38 +0000</pubDate>
      <link>https://forem.com/sasikumart/tracing-the-part-of-mongoose-nobody-talks-about-3gj2</link>
      <guid>https://forem.com/sasikumart/tracing-the-part-of-mongoose-nobody-talks-about-3gj2</guid>
      <description>&lt;p&gt;I have published an OpenTelemetry plugin that instruments the Mongoose hydration lifecycle.&lt;br&gt;
&lt;a href="https://www.npmjs.com/package/@sasikumart/mongoose-hydrate-instrumentation" rel="noopener noreferrer"&gt;&lt;code&gt;@sasikumart/mongoose-hydrate-instrumentation&lt;/code&gt;&lt;/a&gt; &lt;/p&gt;
&lt;h2&gt;
  
  
  What's the problem?
&lt;/h2&gt;

&lt;p&gt;If you're already using OTEL with Mongoose, you're probably tracing your queries — &lt;code&gt;find&lt;/code&gt;, &lt;code&gt;insertOne&lt;/code&gt;, &lt;code&gt;updateMany&lt;/code&gt;, and so on. That covers a lot, but it misses something that happens &lt;em&gt;after&lt;/em&gt; the query returns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Document hydration.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When Mongoose gets raw BSON/JSON back from MongoDB, it doesn't just hand it to you as-is. It runs it through a whole initialization process, applying defaults, attaching virtuals, setting up getters/setters, populating nested subdocuments. For simple schemas with small result sets this is negligible. But for complex schemas or queries returning hundreds of documents, hydration can be a real cost that's completely invisible in your traces.&lt;/p&gt;

&lt;p&gt;That blind spot is what this plugin fixes.&lt;/p&gt;
&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;It patches Mongoose's internal hydration lifecycle and wraps it with OTEL spans, so you get visibility into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;mongoose.document.init&lt;/code&gt; — which is invoked by &lt;code&gt;mongoose.hydrate&lt;/code&gt; to initialize a &lt;code&gt;new Document&lt;/code&gt; from raw data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each span includes attributes like the model name and document count.&lt;/p&gt;
&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @sasikumart/mongoose-hydrate-instrumentation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Register it with your Node SDK before your models load:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;NodeSDK&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/sdk-node&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;MongooseHydrateInstrumentation&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@sasikumart/mongoose-hydrate-instrumentation&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sdk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;NodeSDK&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;instrumentations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MongooseHydrateInstrumentation&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;sdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Your existing OTEL backend (Jaeger, Datadog, Honeycomb, OTLP — whatever you use) will start receiving the hydration spans automatically.&lt;/p&gt;




&lt;p&gt;The package is on npm and the source is on &lt;a href="https://github.com/Sasikumar3096/mongoose-hydrate-instrumentation" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. If you run into any issues or have ideas for improvement, PRs and issues are very welcome.&lt;/p&gt;




&lt;h2&gt;
  
  
  About Me
&lt;/h2&gt;

&lt;p&gt;I’m a backend developer with extensive experience in designing and optimizing scalable backend systems. My expertise includes tackling complex performance challenges. I’ve led numerous database performance initiatives and have also been deeply involved in system design and revamping existing systems. My focus is on enhancing efficiency, ensuring reliability, and delivering robust solutions that scale effectively.&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on &lt;a href="https://www.linkedin.com/in/sasi-kumar-thangavel/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; to learn more about my professional journey and projects.&lt;/p&gt;

</description>
      <category>opentelemetry</category>
      <category>mongoose</category>
      <category>node</category>
      <category>mongodb</category>
    </item>
    <item>
      <title>Postgres “almost” Outage Postmortem: The Hidden Dangers of Replication Slots and Autovacuum</title>
      <dc:creator>Sasi Kumar T</dc:creator>
      <pubDate>Sat, 01 Mar 2025 17:20:16 +0000</pubDate>
      <link>https://forem.com/sasikumart/postgres-almost-outage-postmortem-the-hidden-dangers-of-replication-slots-and-autovacuum-2nem</link>
      <guid>https://forem.com/sasikumart/postgres-almost-outage-postmortem-the-hidden-dangers-of-replication-slots-and-autovacuum-2nem</guid>
      <description>&lt;p&gt;One day, I noticed something odd with our Postgres instance. It was experiencing intermittent performance spikes, load averages were hitting unusually high levels, and I/O utilization occasionally maxed out at 100%. Most workloads were running smoothly, but these spikes felt off. So, I put on my debugging hat and got to work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcn5g3wlof0h5zvz5duvx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcn5g3wlof0h5zvz5duvx.png" alt="CPU Usage" width="467" height="291"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbcclx8tss3pbbfa95luo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbcclx8tss3pbbfa95luo.png" alt="System Load" width="440" height="280"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxw18jpt1z0y83y16rxp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxw18jpt1z0y83y16rxp.png" alt="Memory Usage" width="453" height="290"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Investigating the Spikes
&lt;/h3&gt;

&lt;p&gt;While analyzing the running queries, I discovered that &lt;strong&gt;Autovacuum&lt;/strong&gt; was actively running on the &lt;em&gt;tasks&lt;/em&gt; database. This database is used for scheduling pre-scheduled tasks from one of our services, which meant it had a high volume of write activity. Seeing Autovacuum run wasn’t surprising, so I suspected it was the only contributor to the performance issue.&lt;/p&gt;

&lt;p&gt;To mitigate the problem, we decided to &lt;strong&gt;move the &lt;em&gt;tasks&lt;/em&gt; database to a new VM&lt;/strong&gt;, so that the other systems are not impacted.&lt;/p&gt;

&lt;p&gt;Once the migration was complete, the results were immediate and significant&lt;br&gt;
Below are the screenshots for the same&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Facv0w6melvlgjvcpjg7f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Facv0w6melvlgjvcpjg7f.png" alt="System Load after moving *tasks* DB out" width="800" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cmzotw0lms4l81sz0ns.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cmzotw0lms4l81sz0ns.png" alt="Disk IOps" width="800" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5araqy85ndfnks99zv40.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5araqy85ndfnks99zv40.png" alt="IO Utilisation" width="800" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The graphs showed a significant improvement, &lt;strong&gt;I/O utilization no longer spiked beyond 40%&lt;/strong&gt;, and the overall average had dropped below 15%. A big win, right? Well, &lt;strong&gt;yes and no&lt;/strong&gt;. While the system was in much better shape, I still wasn’t fully satisfied.&lt;/p&gt;

&lt;p&gt;I also noticed occasional &lt;strong&gt;Autovacuum processes&lt;/strong&gt; running during business hours. While they weren’t as aggressive as they had been for the &lt;em&gt;tasks&lt;/em&gt; database, they still contributed to some system load. This got me thinking, &lt;strong&gt;could I optimize things even further?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With the immediate issue resolved, I swapped my &lt;strong&gt;debugging hat&lt;/strong&gt; for my &lt;strong&gt;analyzing glasses&lt;/strong&gt; and dug deeper.&lt;/p&gt;
&lt;h3&gt;
  
  
  Optimizing Storage with Manual Vacuum
&lt;/h3&gt;

&lt;p&gt;Since the &lt;em&gt;tasks&lt;/em&gt; database had previously triggered frequent Autovacuum processes, I decided to run a manual Autovacuum on its tables. With the actual database now running on a different machine and the current one no longer in use, this seemed like a safe and logical step&lt;/p&gt;

&lt;p&gt;Before running the vacuum, I took note of the storage usage across all databases on the VM:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;database_name&lt;/th&gt;
&lt;th&gt;size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;tasks&lt;/td&gt;
&lt;td&gt;42 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;wickets&lt;/td&gt;
&lt;td&gt;12 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;leash&lt;/td&gt;
&lt;td&gt;6833 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;toolbox&lt;/td&gt;
&lt;td&gt;5857 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;bases&lt;/td&gt;
&lt;td&gt;1126 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Which totals upto, 67.816 GB. &lt;/p&gt;
&lt;h3&gt;
  
  
  Vacuuming &lt;em&gt;tasks&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;Here’s a before-and-after comparison of the vacuum operation on the &lt;em&gt;tasks&lt;/em&gt; database:&lt;/p&gt;

&lt;p&gt;Before&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;relname&lt;/th&gt;
&lt;th&gt;n_live_tup&lt;/th&gt;
&lt;th&gt;n_dead_tup&lt;/th&gt;
&lt;th&gt;total_size&lt;/th&gt;
&lt;th&gt;table_size&lt;/th&gt;
&lt;th&gt;indexes_size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;tasks_p1&lt;/td&gt;
&lt;td&gt;1,026,187&lt;/td&gt;
&lt;td&gt;176,574&lt;/td&gt;
&lt;td&gt;13 GB&lt;/td&gt;
&lt;td&gt;4956 MB&lt;/td&gt;
&lt;td&gt;8211 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tasks_p2&lt;/td&gt;
&lt;td&gt;1,023,375&lt;/td&gt;
&lt;td&gt;57,710&lt;/td&gt;
&lt;td&gt;13 GB&lt;/td&gt;
&lt;td&gt;4666 MB&lt;/td&gt;
&lt;td&gt;7748 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tasks_p3&lt;/td&gt;
&lt;td&gt;1,020,129&lt;/td&gt;
&lt;td&gt;12369&lt;/td&gt;
&lt;td&gt;12 GB&lt;/td&gt;
&lt;td&gt;4562 MB&lt;/td&gt;
&lt;td&gt;7782 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;relname&lt;/th&gt;
&lt;th&gt;n_live_tup&lt;/th&gt;
&lt;th&gt;n_dead_tup&lt;/th&gt;
&lt;th&gt;total_size&lt;/th&gt;
&lt;th&gt;table_size&lt;/th&gt;
&lt;th&gt;indexes_size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;tasks_p1&lt;/td&gt;
&lt;td&gt;1,026,187&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1298 MB&lt;/td&gt;
&lt;td&gt;1138 MB&lt;/td&gt;
&lt;td&gt;140 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tasks_p2&lt;/td&gt;
&lt;td&gt;1,023,375&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1296 MB&lt;/td&gt;
&lt;td&gt;1137 MB&lt;/td&gt;
&lt;td&gt;140 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tasks_p3&lt;/td&gt;
&lt;td&gt;1,020,129&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1297 MB&lt;/td&gt;
&lt;td&gt;1136 MB&lt;/td&gt;
&lt;td&gt;140 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  The Shocking Discovery
&lt;/h3&gt;

&lt;p&gt;Manual vacuuming &lt;strong&gt;cut our storage usage by 32 GB&lt;/strong&gt;, reducing &lt;em&gt;tasks&lt;/em&gt; from 36 GB to just &lt;strong&gt;4 GB,&lt;/strong&gt; an &lt;strong&gt;89% reduction&lt;/strong&gt;. Seeing such a drastic change made me wonder: &lt;strong&gt;What if I applied the same process to the other databases on this Postgres instance?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So, I did exactly that. The result is that the storage usage dropped from ~86% to 76%&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F14ien8sl7g1v22lhx3yy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F14ien8sl7g1v22lhx3yy.png" alt="Disk usage after FULL Vacuum" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s when I stumbled upon something strange. While all the databases combined used around &lt;strong&gt;70 GB&lt;/strong&gt;, but the metrics shows that the cluster was consuming nearly &lt;strong&gt;1.06 TB&lt;/strong&gt; of storage.&lt;/p&gt;

&lt;p&gt;Something wasn’t adding up. &lt;strong&gt;Where was all that extra space being used?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So, I started digging into PostgreSQL’s &lt;strong&gt;data directory&lt;/strong&gt; and quickly discovered a directory filled with &lt;strong&gt;WAL (Write-Ahead Log) files&lt;/strong&gt; consuming the majority of the disk space. This didn’t make sense at all, our cluster wasn’t configured to retain WAL files indefinitely.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Write-Ahead Logging (WAL) is a critical feature in PostgreSQL that ensures data integrity and durability. It records changes to the database before they are applied, allowing PostgreSQL to recover data and restore the database to its most recent state in case of a crash or failure.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Curious about why so many WAL files were piling up, I turned to &lt;strong&gt;Google&lt;/strong&gt; for possible explanations. One potential culprit stood out: &lt;strong&gt;inactive replication slots&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A &lt;strong&gt;replication slot&lt;/strong&gt; is a feature that ensures WAL (Write-Ahead Log) files are retained until a &lt;strong&gt;replica (standby server) or a logical consumer&lt;/strong&gt; has processed them. It prevents WAL files from being removed before they are received by the subscriber, ensuring smooth replication.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To confirm this, I ran the following query to check for active replication slots:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_replication_slots&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ideally, there should have been &lt;strong&gt;zero active slots&lt;/strong&gt;, but instead, I found &lt;strong&gt;two logical replication slots,&lt;/strong&gt; remnants of an old experiment that were never removed. Since these slots were inactive, the cluster was &lt;strong&gt;holding onto WAL files indefinitely&lt;/strong&gt;, waiting for the corresponding consumers to process them. But those consumers were &lt;strong&gt;never coming back online&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This was the root cause of the massive storage bloat.&lt;/p&gt;

&lt;p&gt;Next, I took the logical step of &lt;strong&gt;removing the inactive replication slots&lt;/strong&gt;. As soon as I did, the disk usage &lt;strong&gt;dropped to just 40 GB&lt;/strong&gt;, freeing up a massive amount of space instantly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmm6iryvh3ys9fp3zifi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmm6iryvh3ys9fp3zifi.png" alt="Disk Usage after removing replication slots" width="800" height="336"&gt;&lt;/a&gt;&lt;br&gt;
I had achieved something great, &lt;strong&gt;a massive reduction in storage&lt;/strong&gt;. But that wasn’t my primary goal. My real objective was to &lt;strong&gt;reduce system load&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Strangely, after removing the replication slots, &lt;strong&gt;Autovacuum didn’t run as frequently as before&lt;/strong&gt;. While the outcome was exactly what I wanted, the &lt;strong&gt;"why"&lt;/strong&gt; behind it kept bothering me. I needed to understand &lt;strong&gt;how&lt;/strong&gt; this change had affected Autovacuum’s behavior.&lt;/p&gt;

&lt;p&gt;The following are the CPU and the load average, for the week &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmtgwcmqisflmj9sh0i72.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmtgwcmqisflmj9sh0i72.png" alt="CPU Usage for the week" width="800" height="144"&gt;&lt;/a&gt;&lt;br&gt;
After &lt;strong&gt;removing the replication slots&lt;/strong&gt;, CPU usage dropped &lt;strong&gt;from &lt;code&gt;~80%&lt;/code&gt; to &lt;code&gt;&amp;lt;10%&lt;/code&gt;&lt;/strong&gt; during business hours and &lt;strong&gt;from &lt;code&gt;~35%&lt;/code&gt; to &amp;lt; &lt;code&gt;1.5%&lt;/code&gt;&lt;/strong&gt; during off-hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  How and Why?
&lt;/h3&gt;

&lt;p&gt;It was only later that I realized &lt;strong&gt;why Autovacuum had been running so aggressively before&lt;/strong&gt;, it wasn't actually cleaning up &lt;strong&gt;dead tuples&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To break it down, let’s go through the &lt;strong&gt;5 Whys&lt;/strong&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Why wasn’t Autovacuum removing dead tuples, even though it was running frequently?
&lt;/h3&gt;

&lt;p&gt;The problem was caused by &lt;strong&gt;inactive replication slots&lt;/strong&gt;. Replication had been initially set up on the &lt;code&gt;toolbox&lt;/code&gt; database, which &lt;strong&gt;prevented Autovacuum from cleaning up dead tuples across the entire cluster&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. How do replication slots impact Autovacuum?
&lt;/h3&gt;

&lt;p&gt;Replication slots track a value called &lt;strong&gt;&lt;code&gt;catalog_xmin&lt;/code&gt;&lt;/strong&gt;, which represents the oldest transaction ID affecting system catalogs that must be retained.&lt;/p&gt;

&lt;p&gt;When a replication slot is inactive, PostgreSQL &lt;strong&gt;holds onto WAL (Write-Ahead Log) entries for all transactions after the &lt;code&gt;catalog_xmin&lt;/code&gt;&lt;/strong&gt;. Because of this, Autovacuum &lt;strong&gt;cannot remove dead tuples&lt;/strong&gt;, and they keep accumulating until the replication slot catches up or is removed.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. How does replication slot present in one database block Autovacuum on another?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;catalog_xmin&lt;/code&gt; exists &lt;strong&gt;at both the cluster level and individual database level&lt;/strong&gt;. Since &lt;strong&gt;it applies to the entire cluster&lt;/strong&gt;, an inactive replication slot on &lt;strong&gt;one database&lt;/strong&gt; can prevent Autovacuum from removing dead tuples &lt;strong&gt;in all databases&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This meant that Autovacuum &lt;strong&gt;was running&lt;/strong&gt; but &lt;strong&gt;skipping cleanup&lt;/strong&gt;, as the replication slot still "needed" those old tuples.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. If Autovacuum couldn't remove dead tuples, how did a manual vacuum reclaim space?
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;FULL VACUUM&lt;/strong&gt; (manual vacuum) works differently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It creates &lt;strong&gt;a new physical table file&lt;/strong&gt; and copies only &lt;strong&gt;live tuples&lt;/strong&gt; into it.&lt;/li&gt;
&lt;li&gt;The old table (containing dead tuples) is &lt;strong&gt;completely discarded&lt;/strong&gt;, freeing up space.&lt;/li&gt;
&lt;li&gt;Unlike Autovacuum, FULL VACUUM &lt;strong&gt;ignores &lt;code&gt;catalog_xmin&lt;/code&gt;&lt;/strong&gt;, so it isn’t blocked by replication slots.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This explains why &lt;strong&gt;manual vacuuming instantly freed up disk space&lt;/strong&gt;, while Autovacuum had been ineffective.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. If FULL VACUUM is so effective, why isn't it recommended?
&lt;/h3&gt;

&lt;p&gt;FULL VACUUM is a &lt;strong&gt;blocking operation&lt;/strong&gt;, it locks the table and can cause &lt;strong&gt;performance issues&lt;/strong&gt; on production systems. This is why it's not a standard solution for routine maintenance.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Key Takeaway
&lt;/h3&gt;

&lt;p&gt;Inactive replication slots had been &lt;strong&gt;blocking Autovacuum from doing its job&lt;/strong&gt;, causing a buildup of dead tuples and high system load. Once the replication slots were removed, Autovacuum could work properly again, leading to &lt;strong&gt;drastically lower CPU usage and a much healthier database&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  About Me
&lt;/h2&gt;

&lt;p&gt;I’m a backend developer with extensive experience in designing and optimizing scalable backend systems. My expertise includes tackling complex performance challenges. I’ve led numerous database performance initiatives and have also been deeply involved in system design and revamping existing systems. My focus is on enhancing efficiency, ensuring reliability, and delivering robust solutions that scale effectively.&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on &lt;a href="https://www.linkedin.com/in/sasi-kumar-thangavel/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; to learn more about my professional journey and projects.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>backend</category>
      <category>database</category>
    </item>
    <item>
      <title>Boosting PostgreSQL Performance: Optimising Queries with the != Operator</title>
      <dc:creator>Sasi Kumar T</dc:creator>
      <pubDate>Wed, 25 Sep 2024 15:27:20 +0000</pubDate>
      <link>https://forem.com/sasikumart/boosting-postgresql-performance-optimising-queries-with-the-operator-3db6</link>
      <guid>https://forem.com/sasikumart/boosting-postgresql-performance-optimising-queries-with-the-operator-3db6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction:
&lt;/h2&gt;

&lt;p&gt;PostgreSQL is known for its flexibility and efficiency in handling a variety of queries. However, certain query patterns, such as those involving the &lt;code&gt;!=&lt;/code&gt; operator, can lead to performance challenges. &lt;/p&gt;

&lt;p&gt;In this post, we'll discuss how to optimize queries involving &lt;code&gt;!=&lt;/code&gt; and demonstrate the performance improvements achieved by creating a partial index.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Performance Challenge with &lt;code&gt;!=&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Using the &lt;code&gt;!=&lt;/code&gt; operator can make it difficult for PostgreSQL to utilize indexes efficiently. This often results in slower query performance because PostgreSQL may need to scan a larger portion of the table or perform more complex operations.&lt;/p&gt;

&lt;p&gt;Consider the query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"ticket"&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nv"&gt;"assignedTo"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sasikumar'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt;
      &lt;span class="nv"&gt;"businessId"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'B00154'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt;
      &lt;span class="nv"&gt;"statusId"&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While this query seems simple, it presents a challenge to PostgreSQL’s query planner. The &lt;code&gt;!=&lt;/code&gt; operator cannot efficiently use traditional B-tree indexes because it involves excluding a value, rather than narrowing down to a specific match. When indexes can’t be used effectively, PostgreSQL may have to resort to a full table scan, which can severely degrade performance, especially on large datasets.&lt;/p&gt;

&lt;p&gt;The execution plan of the above query is as follows&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Aggregate&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;34899&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;83&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;34899&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;84&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;21912&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;045&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;21912&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;047&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;Index&lt;/span&gt; &lt;span class="k"&gt;Only&lt;/span&gt; &lt;span class="n"&gt;Scan&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;ticket_businessId_assignedto_statusid_idx&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;56&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;34899&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;282&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;21912&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;039&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;21912&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;040&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;Index&lt;/span&gt; &lt;span class="n"&gt;Cond&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nv"&gt;"businessId"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'B00154'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"assignedTo"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sasikumar'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;Filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"statusId"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;Rows&lt;/span&gt; &lt;span class="n"&gt;Removed&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;Filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;25617&lt;/span&gt;
&lt;span class="n"&gt;Heap&lt;/span&gt; &lt;span class="n"&gt;Fetches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;23630&lt;/span&gt;

&lt;span class="n"&gt;Planning&lt;/span&gt; &lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;394&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="n"&gt;Execution&lt;/span&gt; &lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;230&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;119&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are not familiar with what’s happening dont worry, read below&lt;/p&gt;

&lt;h3&gt;
  
  
  Execution Plan Breakdown:
&lt;/h3&gt;

&lt;h3&gt;
  
  
  1. Aggregate
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Aggregate&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;34899&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;83&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;34899&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;84&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;21912&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;045&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;21912&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;047&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Aggregate&lt;/strong&gt;: This represents the final step where the aggregation (counting rows in this case) takes place.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: The estimated cost to execute is &lt;code&gt;34899.83&lt;/code&gt; to &lt;code&gt;34899.84&lt;/code&gt;. The cost is an internal PostgreSQL unit representing the estimated computational effort.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rows&lt;/strong&gt;: The planner estimates that 1 row will be output after aggregation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Width&lt;/strong&gt;: The width of the output row is 8 bytes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actual Time&lt;/strong&gt;: The actual time taken is &lt;code&gt;21912.045 ms&lt;/code&gt; to &lt;code&gt;21912.047 ms&lt;/code&gt;, indicating the duration to perform the aggregation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loops&lt;/strong&gt;: This is executed once, as it is the final aggregation step.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Index Only Scan
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;Index&lt;/span&gt; &lt;span class="k"&gt;Only&lt;/span&gt; &lt;span class="n"&gt;Scan&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;ticket_businessId_assignedto_statusid_idx&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;56&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;34899&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;282&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;21912&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;039&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;21912&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;040&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Index Only Scan&lt;/strong&gt;: This indicates that an index scan is used to access the data. Since the scan is “only” on the index, it means no heap (table) access is needed initially.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Index Name&lt;/strong&gt;: &lt;code&gt;ticket_businessId_assignedto_statusid_idx&lt;/code&gt; is the index being used.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: The cost to perform the index scan ranges from &lt;code&gt;0.56&lt;/code&gt; to &lt;code&gt;34899.13&lt;/code&gt;. This range estimates the computational effort required to access the index.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rows&lt;/strong&gt;: The planner estimates that 282 rows will match the index condition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actual Time&lt;/strong&gt;: The actual time taken for the index scan is &lt;code&gt;21912.039 ms&lt;/code&gt; to &lt;code&gt;21912.040 ms&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rows&lt;/strong&gt;: Despite the estimate, no rows matched the index condition in reality (&lt;code&gt;rows=0&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Index Conditions and Filters
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Index&lt;/span&gt; &lt;span class="n"&gt;Cond&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nv"&gt;"businessId"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'B00154'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"assignedTo"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sasikumar'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;Filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"statusId"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;Rows&lt;/span&gt; &lt;span class="n"&gt;Removed&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;Filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;25617&lt;/span&gt;
&lt;span class="n"&gt;Heap&lt;/span&gt; &lt;span class="n"&gt;Fetches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;23630&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Index Cond&lt;/strong&gt;: The index condition is that both &lt;code&gt;businessId&lt;/code&gt; and &lt;code&gt;assignedTo&lt;/code&gt; match the given values. This is used to narrow down the potential rows from the index.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter&lt;/strong&gt;: After applying the index condition, the query filters out rows where &lt;code&gt;statusId&lt;/code&gt; is not equal to 2.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rows Removed by Filter&lt;/strong&gt;: 25,617 rows were removed because they did not meet the filter condition &lt;code&gt;statusId &amp;lt;&amp;gt; 2&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heap Fetches&lt;/strong&gt;: Even though the scan is index-only, PostgreSQL sometimes needs to fetch additional data from the heap to validate rows or retrieve data not included in the index. Here, 23,630 heap fetches occurred.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Points
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Index Usage&lt;/strong&gt;: The query uses an index-only scan to access rows based on &lt;code&gt;businessId&lt;/code&gt; and &lt;code&gt;assignedTo&lt;/code&gt;. This is efficient as it avoids accessing the table's heap initially.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filtering&lt;/strong&gt;: The filter (&lt;code&gt;statusId &amp;lt;&amp;gt; 2&lt;/code&gt;) is applied after retrieving rows from the index. Since the filter was not selective enough, a large number of rows (25,617) were filtered out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heap Fetches&lt;/strong&gt;: The number of heap fetches (23,630) suggests that many rows needed additional data from the heap, despite the use of an index-only scan.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One thing I want you the reader to know here is that, the &lt;code&gt;statusId&lt;/code&gt; field has cardinality of 5, having values ranging from &lt;code&gt;0&lt;/code&gt; to &lt;code&gt;4&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Now, based on this information we do the following&lt;/p&gt;

&lt;h2&gt;
  
  
  Approach 1: Query Rewrite
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"ticket"&lt;/span&gt; 
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nv"&gt;"assignedTo"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sasikumar'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt;
           &lt;span class="nv"&gt;"businessId"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'B00154'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt;
           &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"statusId"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="nv"&gt;"statusId"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The filter is modified to to get statusId’s &amp;gt; 2 and statusId’s &amp;lt; 2 which is the mathematical equivalent of &lt;code&gt;statusId != 2&lt;/code&gt; . &lt;/p&gt;

&lt;p&gt;The execution plan of the above query is as follows&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Aggregate&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1142&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;07&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;1142&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;08&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;691&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;691&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;204&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Bitmap&lt;/span&gt; &lt;span class="n"&gt;Heap&lt;/span&gt; &lt;span class="n"&gt;Scan&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;77&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;1141&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;37&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;281&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;691&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;195&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;691&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;197&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;Recheck&lt;/span&gt; &lt;span class="n"&gt;Cond&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(((&lt;/span&gt;&lt;span class="nv"&gt;"businessId"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'B00154'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"assignedTo"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sasikumar'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"statusId"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nv"&gt;"businessId"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'B00154'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"assignedTo"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sasikumar'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"statusId"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="n"&gt;Heap&lt;/span&gt; &lt;span class="n"&gt;Blocks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;exact&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;574&lt;/span&gt;
&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;BitmapOr&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;77&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;77&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;282&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;966&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;968&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Bitmap&lt;/span&gt; &lt;span class="k"&gt;Index&lt;/span&gt; &lt;span class="n"&gt;Scan&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;ticket_businessId_assignedto_statusid_idx&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;37&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;145&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;864&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;864&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;628&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;Planning&lt;/span&gt; &lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;249&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="n"&gt;Execution&lt;/span&gt; &lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;881&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Summary of Execution
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Bitmap Index Scan&lt;/strong&gt;: The query starts with scanning the index &lt;code&gt;ticket_businessId_assignedto_statusid_idx&lt;/code&gt; to find rows where the &lt;code&gt;businessId&lt;/code&gt; and &lt;code&gt;assignedTo&lt;/code&gt; match, and then the &lt;code&gt;statusId&lt;/code&gt; condition is applied. This index scan is efficient in finding rows but does not directly give the full row data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BitmapOr&lt;/strong&gt;: This combines the results from multiple bitmap index scans. Here, it combines results where &lt;code&gt;statusId&lt;/code&gt; is either greater than 2 or less than 2.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bitmap Heap Scan&lt;/strong&gt;: Using the results from the &lt;code&gt;BitmapOr&lt;/code&gt;, it reads the actual rows from the heap and applies the conditions again. This step verifies the rows from the index scan and ensures they meet the &lt;code&gt;statusId&lt;/code&gt; conditions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aggregate&lt;/strong&gt;: Finally, it aggregates the results from the heap scan, counting the number of rows that satisfy the conditions.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Congrats! You might have noticed that the first query returned 1 row, while the second one returned 0 rows. That's not a mistake—it's just because these queries were run on a live database, so the results are not the same.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Approach 2: Using Partial Index
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;&lt;em&gt;partial index&lt;/em&gt;&lt;/strong&gt; is an index that only includes rows meeting a specific condition defined in the &lt;code&gt;WHERE&lt;/code&gt; clause of the &lt;code&gt;CREATE INDEX&lt;/code&gt; statement. This allows you to create an index that is smaller and more efficient by excluding rows that don't match the condition.&lt;/p&gt;

&lt;h3&gt;
  
  
  Characteristics of Partial Indexes:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Selective Indexing&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;A partial index is created with a condition, which means it only indexes the rows that satisfy this condition. For example, you might only index rows where a column value is greater than a certain threshold.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Reduced Index Size&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;Because it only indexes a subset of the table, a partial index is usually smaller and consumes less disk space than a full index. This can make index scans faster and reduce memory usage.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Improved Performance for Specific Queries&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;Partial indexes are beneficial for queries that frequently filter on specific conditions. They can speed up these queries by avoiding the overhead of indexing all rows in the table.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;In our case, we need to create a index with fields (businessId, assignedTo) where statusId is ≠ 2. The index creation command is as follows&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;CONCURRENTLY&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt;
&lt;span class="n"&gt;ticket_businessId_assignedto_statusid_ne_2_idx&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"businessId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"assignedTo"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"statusId"&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"ticket"&lt;/span&gt; 
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nv"&gt;"assignedTo"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sasikumar'&lt;/span&gt; 
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nv"&gt;"businessId"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'B00154'&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nv"&gt;"statusId"&lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;Aggregate&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;599&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;68&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;599&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;69&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;022&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;023&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;Index&lt;/span&gt; &lt;span class="k"&gt;Only&lt;/span&gt; &lt;span class="n"&gt;Scan&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;ticket_businessId_assignedto_statusid_ne_2_idx&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;598&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;97&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;283&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;019&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;019&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;Index&lt;/span&gt; &lt;span class="n"&gt;Cond&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nv"&gt;"businessId"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'B00154'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"assignedTo"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sasikumar'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;Heap&lt;/span&gt; &lt;span class="n"&gt;Fetches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;Planning&lt;/span&gt; &lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;198&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;span class="n"&gt;Execution&lt;/span&gt; &lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;052&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Execution Plan Breakdown:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Aggregate&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Actual Time&lt;/strong&gt;: &lt;code&gt;0.022..0.023 ms&lt;/code&gt; — The actual time taken for the aggregate operation to complete is between 0.022 and 0.023 milliseconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rows&lt;/strong&gt;: &lt;code&gt;1&lt;/code&gt; — The aggregate operation returned 1 row.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loops&lt;/strong&gt;: &lt;code&gt;1&lt;/code&gt; — This aggregate operation ran once.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Index Only Scan&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Actual Time&lt;/strong&gt;: &lt;code&gt;0.019..0.019 ms&lt;/code&gt; — The actual time taken for the index scan is 0.019 milliseconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rows&lt;/strong&gt;: &lt;code&gt;283&lt;/code&gt; — The planner estimates that 283 rows would be returned by the index scan.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Width&lt;/strong&gt;: &lt;code&gt;0&lt;/code&gt; — The width of the rows (data returned) is 0, indicating that only the index entries are being scanned, not the table rows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Index Cond&lt;/strong&gt;: &lt;code&gt;("businessId" = 'B00154'::text) AND ("assignedTo" = 'sasikumar'::text)&lt;/code&gt; — The condition used to filter rows in the index scan.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The index-only scan uses the &lt;code&gt;ticket_businessId_assignedto_statusid_ne_2_idx&lt;/code&gt; index to locate rows where &lt;code&gt;businessId&lt;/code&gt; and &lt;code&gt;assignedTo&lt;/code&gt; match the specified values. It retrieves the index entries without needing to access the actual table rows because all necessary data is available in the index.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Heap Fetches&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Heap Fetches&lt;/strong&gt;: &lt;code&gt;0&lt;/code&gt; — Indicates that no heap fetches were needed. Since the index contains all required data, PostgreSQL did not need to access the actual table rows (the heap) for additional information.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Planning Time&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Planning Time&lt;/strong&gt;: &lt;code&gt;0.198 ms&lt;/code&gt; — The time taken by PostgreSQL to plan the query execution, including generating the execution plan.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Execution Time&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Execution Time&lt;/strong&gt;: &lt;code&gt;0.052 ms&lt;/code&gt; — The total time taken to execute the query and retrieve the result.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Originally, the query used a full index &lt;code&gt;ticket_businessId_assignedto_statusid_idx&lt;/code&gt;, which was 727 MB. Now, it's using a partial index that's only 3.17 MB.&lt;br&gt;
In database terms, this means faster index scans, reduced memory usage when loading the index, and improved query performance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Conclusion:
&lt;/h2&gt;

&lt;p&gt;In the world of PostgreSQL, optimizing queries can sometimes feel like navigating a maze, especially when dealing with operators like &lt;code&gt;!=&lt;/code&gt;. As we explored, such queries can often lead to performance hurdles due to inefficient index utilization and the need for table scans.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Challenges with &lt;code&gt;!=&lt;/code&gt;&lt;/strong&gt;: Queries that use the &lt;code&gt;!=&lt;/code&gt; operator can be less efficient because they force PostgreSQL to perform broader scans and potentially filter out a significant number of rows. This was evident in our initial example, where the query needed to filter out many rows, leading to a high number of heap fetches and considerable execution time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query Rewrite Efficiency&lt;/strong&gt;: By reformulating the query to use &lt;code&gt;statusId &amp;gt; 2 OR statusId &amp;lt; 2&lt;/code&gt;, we leveraged PostgreSQL's ability to optimize these conditions more effectively. This approach helped reduce the execution time significantly, demonstrating that a slight change in query logic can yield substantial performance improvements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Power of Partial Indexes&lt;/strong&gt;: The use of partial indexes can be a game-changer. By creating an index specifically for rows where &lt;code&gt;statusId != 2&lt;/code&gt;, we reduced the index size and optimized the scan process. This specialized index not only sped up the query but also eliminated the need for additional heap fetches, showcasing how targeted indexing can enhance performance.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  About Me
&lt;/h2&gt;

&lt;p&gt;I’m a backend developer with extensive experience in designing and optimizing scalable backend systems. My expertise includes tackling complex performance challenges. I’ve led numerous database performance initiatives and have also been deeply involved in system design and revamping existing systems. My focus is on enhancing efficiency, ensuring reliability, and delivering robust solutions that scale effectively.&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on &lt;a href="https://www.linkedin.com/in/sasi-kumar-thangavel/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; to learn more about my professional journey and projects.&lt;/p&gt;

</description>
      <category>database</category>
      <category>postgres</category>
      <category>sql</category>
      <category>performance</category>
    </item>
    <item>
      <title>OverComing Weak Consistency using Cache</title>
      <dc:creator>Sasi Kumar T</dc:creator>
      <pubDate>Fri, 25 Nov 2022 10:42:54 +0000</pubDate>
      <link>https://forem.com/sasikumart/overcoming-weak-consistency-using-cache-2i2h</link>
      <guid>https://forem.com/sasikumart/overcoming-weak-consistency-using-cache-2i2h</guid>
      <description>&lt;p&gt;Recently I had faced off with weak consistency for the first time in my professional career(&amp;lt; 2 yrs). I didn’t know that the DB which we were using in our PROD env was weakly consistent when reads are done using secondary replica. (how silly of me 😓)&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Weak/Eventual Consistency&lt;/strong&gt; is a guarantee that when an update is made in a distributed database, that update will eventually be reflected in all nodes that store the data, resulting in the same response every time the data is queried.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What was I doing?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Consider two tables&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Users&lt;/strong&gt; table, having the below columns&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;age&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Relationship&lt;/strong&gt; table, which has the below columns&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;contactId&lt;/th&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;id is the primary key of the &lt;strong&gt;users&lt;/strong&gt; table&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Usually the relationship is created whenever some info is to be persisted for a particular user in the users table. When there is no relationship existing, a new DB row is created. And a new entry is saved in the relationship table mapping to the new user entry&lt;/p&gt;

&lt;p&gt;When, new details are to be saved, the &lt;strong&gt;users &lt;code&gt;id&lt;/code&gt;&lt;/strong&gt; is fetched from the relationship table using the &lt;code&gt;contactId&lt;/code&gt; and the details are persisted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What was problem?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Consider this scenario, when someone lands on a webpage, we will have some details like &lt;code&gt;ip, timezone, email&lt;/code&gt; which can be auto collected. &lt;/p&gt;

&lt;p&gt;When trying to save the first field, &lt;code&gt;email&lt;/code&gt; we check the &lt;strong&gt;Relationship&lt;/strong&gt; table, if there is no data present, a new row is created in the users table and add a new &lt;code&gt;relationship&lt;/code&gt; entry is also created.&lt;/p&gt;

&lt;p&gt;When the next update of &lt;code&gt;ip&lt;/code&gt; is to be done we check the &lt;strong&gt;Relationship&lt;/strong&gt; table, and get the &lt;code&gt;id&lt;/code&gt; and do the updates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getRelationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createRelationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;updateUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;dataToUpdate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;forEachUpdate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;dataToUpdate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;relationship&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getRelationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;newUser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createUser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nx"&gt;relationship&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createRelationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;updateUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;dataToUpdate&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// main&lt;/span&gt;
&lt;span class="nf"&gt;forEachUpdate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sasi@hotmail.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sasi@hotmail.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;forEachUpdate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sasi@hotmail.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;127.0.0.1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem happens when we don’t find a &lt;code&gt;relationship&lt;/code&gt; when checking after a few milliseconds of delay. &lt;em&gt;But how? A new entry was added previously? What happened was,&lt;/em&gt; The writes are done in a primary replica, the reads goes to secondary replica. Which meant that the reads and writes were going to different servers which will not have the exact same data at any given time. This is called &lt;em&gt;replication lag&lt;/em&gt;. Because of this duplicate user entries were created in the users table.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is the intermediate fix?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pointing the read queries to primary solved the issue. But, there is a reason why we have read replicas so that the primary will not be overloaded with all the read queries and respond quickly for writes. So this was not a permanent fix. For a permanent fix I had to switch to a different database, for which I didn’t have the time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every time, I have to update some data it’s done sequentially. So, whenever a &lt;code&gt;relationship&lt;/code&gt;  is created, a copy of it is stored in Redis(an in-memory cache) too.  &lt;/p&gt;

&lt;p&gt;The sudo-code will look something like the below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;saveInRedis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;saveInDB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createRelationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;saveInDB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;saveInRedis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Therefore, before every update Redis is checked first to see if there is any relationship, if there is none, then the DB is checked&lt;/p&gt;

&lt;p&gt;The function to getRelationship has been transformed like the below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getDataFromRedis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getRelationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;relationShip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getDataFromRedis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contactId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;relationShip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// check in DB&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;relationShip&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>redis</category>
      <category>database</category>
      <category>node</category>
    </item>
  </channel>
</rss>
