<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Swayam Maheshwari</title>
    <description>The latest articles on Forem by Swayam Maheshwari (@swayammaheshwari).</description>
    <link>https://forem.com/swayammaheshwari</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2640170%2Fea23a607-97a6-42f0-a76d-81e5c6921e5d.jpg</url>
      <title>Forem: Swayam Maheshwari</title>
      <link>https://forem.com/swayammaheshwari</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/swayammaheshwari"/>
    <language>en</language>
    <item>
      <title>From 8 CPUs to Efficiency: How a Single Unicode Character Doubled Our Bill</title>
      <dc:creator>Swayam Maheshwari</dc:creator>
      <pubDate>Fri, 15 May 2026 07:53:01 +0000</pubDate>
      <link>https://forem.com/swayammaheshwari/from-8-cpus-to-efficiency-how-a-single-unicode-character-doubled-our-bill-31ki</link>
      <guid>https://forem.com/swayammaheshwari/from-8-cpus-to-efficiency-how-a-single-unicode-character-doubled-our-bill-31ki</guid>
      <description>&lt;p&gt;In the world of cloud computing, auto-scaling is often viewed as a safety net. It’s the magic that keeps your app alive during a traffic surge. But what happens when your server scales to 8 CPUs not because of a surge in users, but because of a "poison pill" hidden in your database queries?&lt;/p&gt;

&lt;p&gt;Last week, our team faced a production crisis: our CPU utilization hit &lt;strong&gt;100%&lt;/strong&gt;, our cloud costs spiked by &lt;strong&gt;100%&lt;/strong&gt; overnight, and the culprit was a single malformed Unicode character.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Incident: The "Ghost" Traffic Spike
&lt;/h2&gt;

&lt;p&gt;It started with an automated alert. Our AWS/GCP instances were hitting their limits, and the auto-scaler was aggressively spinning up 8-core machines.&lt;/p&gt;

&lt;p&gt;When we checked our analytics, the math didn’t add up. We had a very low volume of active users—nowhere near enough to justify that kind of compute power. Yet, the SQL logs showed a different story: the database was gasping for air.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Root Cause: The RegEx CPU Bomb
&lt;/h2&gt;

&lt;p&gt;After digging into our PostgreSQL slow query logs, we found the bottleneck. It was a &lt;code&gt;SELECT&lt;/code&gt; query used for our public sidebar data.&lt;/p&gt;

&lt;p&gt;To prevent the app from crashing due to malformed JSON (caused by binary PDF data and null bytes), we were using a SQL-level fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;regexp_replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"publishedEndpoint"&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;u0000'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'g'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why this killed our performance:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Linear Scans:&lt;/strong&gt; For every single request, the database had to cast large JSON blobs into text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regex Overhead:&lt;/strong&gt; Running a Regular Expression engine over "vast data" (like 2MB strings of PDF-polluted JSON) is extremely CPU-intensive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frequency:&lt;/strong&gt; Because this was a sidebar query, it was being called constantly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Essentially, we were asking our database to perform deep-cleaning surgery on thousands of rows of data every second.&lt;/p&gt;

&lt;h2&gt;
  
  
  How We Fixed It: A Three-Layered Strategy
&lt;/h2&gt;

&lt;p&gt;We realized that "fixing it in the query" was a band-aid that had become a liability. We moved to a multi-layered architectural solution.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Breaking the Query
&lt;/h3&gt;

&lt;p&gt;First, we decoupled the monolithic API. Instead of one massive query that fetched and cleaned everything, we broke it into two separate, optimized APIs. This reduced the "surface area" of the data being processed by the database engine at any one time.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Redis Buffer
&lt;/h3&gt;

&lt;p&gt;Why clean the same data twice? We implemented &lt;strong&gt;Redis&lt;/strong&gt; to store the "sanitized" version of the sidebar.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Flow:&lt;/strong&gt; The first time a user requests the data, the server cleans the Unicode, formats the JSON, and stores it in Redis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Result:&lt;/strong&gt; Subsequent requests are served in milliseconds directly from memory. The database never sees the request, and the CPU stays cool.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Edge Caching with Cloudflare
&lt;/h3&gt;

&lt;p&gt;For our public APIs, we added a layer of protection at the "Edge." By configuring &lt;strong&gt;Cloudflare Cache&lt;/strong&gt;, we ensured that public data is served from the CDN.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This means a user in London gets their data from a London server without ever hitting our origin database in the first place.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Lessons Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Sanitize at the Entry, Not the Exit:&lt;/strong&gt; The best way to handle a &lt;code&gt;\u0000&lt;/code&gt; (null byte) error is to never let it reach the database. Sanitize your inputs in your Node.js/Sequelize logic before the &lt;code&gt;INSERT&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQL is not a Text Editor:&lt;/strong&gt; While SQL can perform RegEx, it is not optimized for it at scale. If you find yourself using &lt;code&gt;regexp_replace&lt;/code&gt; in a high-traffic &lt;code&gt;SELECT&lt;/code&gt;, you are sitting on a performance time bomb.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor Costs as a Metric:&lt;/strong&gt; A spike in CPU is a technical issue; a 100% increase in billing is a business crisis.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By moving the heavy lifting away from the SQL engine and into Redis and Cloudflare, we were able to scale back down to our standard CPU usage, saving our performance and our budget.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Have you ever had a "poison pill" query crash your production? Let's discuss in the comments!&lt;/strong&gt; 🚀&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>programming</category>
      <category>devops</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Why Did My Single-Node Redis Think It Was a Replica? A Debugging Deep Dive</title>
      <dc:creator>Swayam Maheshwari</dc:creator>
      <pubDate>Thu, 14 May 2026 05:46:55 +0000</pubDate>
      <link>https://forem.com/swayammaheshwari/why-did-my-single-node-redis-think-it-was-a-replica-a-debugging-deep-dive-48af</link>
      <guid>https://forem.com/swayammaheshwari/why-did-my-single-node-redis-think-it-was-a-replica-a-debugging-deep-dive-48af</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/swayammaheshwari/the-mystery-of-the-redis-read-only-error-in-a-single-node-setup-519" class="crayons-story__hidden-navigation-link"&gt;The Mystery of the Redis Read-Only Error in a Single-Node Setup&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/swayammaheshwari" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2640170%2Fea23a607-97a6-42f0-a76d-81e5c6921e5d.jpg" alt="swayammaheshwari profile" class="crayons-avatar__image" width="800" height="1147"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/swayammaheshwari" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Swayam Maheshwari
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Swayam Maheshwari
                
              
              &lt;div id="story-author-preview-content-3667427" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/swayammaheshwari" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2640170%2Fea23a607-97a6-42f0-a76d-81e5c6921e5d.jpg" class="crayons-avatar__image" alt="" width="800" height="1147"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Swayam Maheshwari&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/swayammaheshwari/the-mystery-of-the-redis-read-only-error-in-a-single-node-setup-519" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 14&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/swayammaheshwari/the-mystery-of-the-redis-read-only-error-in-a-single-node-setup-519" id="article-link-3667427"&gt;
          The Mystery of the Redis Read-Only Error in a Single-Node Setup
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/productivity"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;productivity&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/devops"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;devops&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/node"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;node&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/swayammaheshwari/the-mystery-of-the-redis-read-only-error-in-a-single-node-setup-519" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;2&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/swayammaheshwari/the-mystery-of-the-redis-read-only-error-in-a-single-node-setup-519#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            4 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>backend</category>
      <category>database</category>
      <category>devops</category>
      <category>sre</category>
    </item>
    <item>
      <title>The Mystery of the Redis Read-Only Error in a Single-Node Setup</title>
      <dc:creator>Swayam Maheshwari</dc:creator>
      <pubDate>Thu, 14 May 2026 05:45:27 +0000</pubDate>
      <link>https://forem.com/swayammaheshwari/the-mystery-of-the-redis-read-only-error-in-a-single-node-setup-519</link>
      <guid>https://forem.com/swayammaheshwari/the-mystery-of-the-redis-read-only-error-in-a-single-node-setup-519</guid>
      <description>&lt;p&gt;If you manage a realtime application, you know that Redis is often the beating heart of your infrastructure. Recently, our production application—which relies heavily on Redis for both backend caching and realtime collaboration (via Hocuspocus/Yjs)—experienced a bizarre and catastrophic outage.&lt;/p&gt;

&lt;p&gt;Every few months, out of nowhere, Redis would randomly crash our system. The logs were flooded with a single, confusing error:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;READONLY You can't write against a read only replica&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The symptoms were severe: writes failed entirely, reads stopped working, and the entire realtime system came to a grinding halt. Restarting the Docker container fixed the issue immediately, but without a root cause, it was only a matter of time before it happened again.&lt;/p&gt;

&lt;p&gt;Here is a step-by-step breakdown of how I investigated, debugged, and ultimately solved this elusive Redis bug.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Evaluating the Infrastructure
&lt;/h2&gt;

&lt;p&gt;Before diving into logs, I needed to confirm exactly what our architecture looked like.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hosting:&lt;/strong&gt; A single Google Cloud Platform (GCP) VM (&lt;code&gt;t2d-standard-1&lt;/code&gt; with Debian 12, 1 vCPU, 4 GB RAM).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment:&lt;/strong&gt; Redis running inside a Docker container.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Topology:&lt;/strong&gt; A single Redis node. No Redis Cluster. No Sentinel. No intentional replicas.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where the mystery deepened. If there was only one Redis node, how could it possibly think it was a "read-only replica"?&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Checking the Current Redis State
&lt;/h2&gt;

&lt;p&gt;My first move was to check the current role of the Redis instance. I connected to the server and ran:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;redis-cli INFO replication

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output was telling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;role:master
connected_slaves:0
master_failover_state:no-failover

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Redis was clearly functioning as a &lt;code&gt;master&lt;/code&gt; with no connected replicas. Whatever had caused the &lt;code&gt;READONLY&lt;/code&gt; error wasn't a permanent state change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Ruling Out the Red Herrings
&lt;/h2&gt;

&lt;p&gt;When debugging distributed systems, it's easy to go down the wrong rabbit hole. Here is what I evaluated and quickly ruled out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Redis Cluster &amp;amp; Sentinel Failovers:&lt;/strong&gt; I wondered if an automated failover had demoted our primary node. However, since we weren't running Cluster or Sentinel mode, there was no orchestration tool present to trigger a failover or slot migration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redlock / Distributed Lock Split-Brain:&lt;/strong&gt; While distributed locks can cause chaos, they don't change a server's replication role.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "Read" Clue:&lt;/strong&gt; If Redis had truly become a standard replica, &lt;em&gt;reads should still have worked&lt;/em&gt;. The fact that reads and writes both failed suggested this wasn't just a simple case of a node functioning as a healthy replica.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 4: Investigating Memory and Resources
&lt;/h2&gt;

&lt;p&gt;Could the server be buckling under memory pressure? I checked the system and Redis memory stats:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;redis-cli INFO memory

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The results were eye-opening, but not in the way I expected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;used_memory_human&lt;/code&gt;: 1.60M&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;used_memory_rss_human&lt;/code&gt;: 15.85M&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;total_system_memory_human&lt;/code&gt;: 3.83G&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our actual dataset was only about 672 KB! Redis was using a fraction of a percent of the VM's RAM. It wasn't an Out-Of-Memory (OOM) crash.&lt;/p&gt;

&lt;p&gt;However, I discovered a &lt;strong&gt;massive production risk&lt;/strong&gt; in our configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;maxmemory:0
maxmemory_policy:noeviction

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With no memory limit and &lt;code&gt;noeviction&lt;/code&gt; set, if Redis ever did fill up, it would refuse all writes. While this wasn't the root cause of the current bug, it was a ticking time bomb that needed immediate fixing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Piecing Together the Root Cause
&lt;/h2&gt;

&lt;p&gt;With OOM and Cluster failovers ruled out, the evidence pointed toward a few highly probable culprits for a single-node setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Accidental &lt;code&gt;REPLICAOF&lt;/code&gt; Execution:&lt;/strong&gt; A rogue script, automation, or network blip might have accidentally sent a &lt;code&gt;REPLICAOF host port&lt;/code&gt; command, temporarily turning the node into a replica.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stale Node.js Client Connections:&lt;/strong&gt; Our Node.js backend and Hocuspocus websocket server maintain long-lived TCP connections. If the network dropped or the Docker container glitched, the client connection pool might have ended up in a stale state, misinterpreting the connection status.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker/Network Instability:&lt;/strong&gt; Temporary network partitions or disk IO blocks (during AOF/RDB saves) might have forced Redis into a protective mode that the application clients misinterpreted.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The temporary nature of the issue, combined with both reads and writes failing, strongly pointed to a combination of &lt;strong&gt;stale client connections&lt;/strong&gt; combined with a transient Docker or network interruption. Restarting the container severed those dead connections and forced a clean reconnect.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: The Fix and Future-Proofing
&lt;/h2&gt;

&lt;p&gt;To stabilize the system and ensure this doesn't happen again, I implemented a multi-layered fix.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Hardening the Memory Config
&lt;/h3&gt;

&lt;p&gt;First, I patched the memory risk by adding proper limits to &lt;code&gt;/etc/redis/redis.conf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;maxmemory&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="n"&gt;gb&lt;/span&gt;
&lt;span class="n"&gt;maxmemory&lt;/span&gt;-&lt;span class="n"&gt;policy&lt;/span&gt; &lt;span class="n"&gt;allkeys&lt;/span&gt;-&lt;span class="n"&gt;lru&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Disabling Dangerous Commands
&lt;/h3&gt;

&lt;p&gt;To prevent any accidental role changes in our single-node setup, I locked down the replication commands in &lt;code&gt;redis.conf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;rename&lt;/span&gt;-&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="n"&gt;REPLICAOF&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="n"&gt;rename&lt;/span&gt;-&lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="n"&gt;SLAVEOF&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Creating a Debug Playbook
&lt;/h3&gt;

&lt;p&gt;I established a strict rule: &lt;strong&gt;Next time it fails, do not restart immediately.&lt;/strong&gt; Instead, run these diagnostics to capture the exact failure state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;redis-cli INFO replication
redis-cli INFO stats
redis-cli CONFIG GET replica-read-only
docker logs &amp;lt;redis-container-name&amp;gt; &lt;span class="nt"&gt;--tail&lt;/span&gt; 200

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Rethinking the Architecture
&lt;/h3&gt;

&lt;p&gt;While a single Redis node is fine for basic caching, heavy realtime workloads (like Hocuspocus Pub/Sub) demand high availability. Our long-term fix isn't to overcomplicate things with Redis Cluster, but rather to migrate to a standard &lt;strong&gt;Primary + Replica + Sentinel&lt;/strong&gt; setup. This will give us automatic failover and separate the realtime collaboration load from the standard cache.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Sometimes the most intimidating errors—like an impossible &lt;code&gt;READONLY&lt;/code&gt; replica state on a single node—are symptoms of deeper infrastructural quirks rather than actual state changes. By methodically checking the actual Redis state, analyzing memory limits, and ruling out red herrings, we not only diagnosed the immediate issue but uncovered hidden risks that made our production environment infinitely stronger.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>devops</category>
      <category>node</category>
    </item>
  </channel>
</rss>
