<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: GasPriceCheck</title>
    <description>The latest articles on Forem by GasPriceCheck (@gaspricecheck).</description>
    <link>https://forem.com/gaspricecheck</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3905345%2Fc58e9d1e-13fa-471c-8325-0448bbd7e28f.png</url>
      <title>Forem: GasPriceCheck</title>
      <link>https://forem.com/gaspricecheck</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/gaspricecheck"/>
    <language>en</language>
    <item>
      <title>Google deindexed half of my Next.js site. Here's the four-phase recovery.</title>
      <dc:creator>GasPriceCheck</dc:creator>
      <pubDate>Thu, 07 May 2026 05:00:00 +0000</pubDate>
      <link>https://forem.com/gaspricecheck/google-deindexed-half-of-my-nextjs-site-heres-the-four-phase-recovery-284j</link>
      <guid>https://forem.com/gaspricecheck/google-deindexed-half-of-my-nextjs-site-heres-the-four-phase-recovery-284j</guid>
      <description>&lt;p&gt;I run a side project: a gas price finder that's mostly programmatic content. Roughly 33,620 ZIP code pages plus a few hundred state and city pages, all built on Next.js 15 with ISR. It runs on Vercel, fronted by Cloudflare for DNS.&lt;/p&gt;

&lt;p&gt;For about eight months it was indexing fine. Traffic was small but growing. Then on April 11 I checked Google Search Console and saw something I'd never seen at this scale: 87 URLs flagged as "not found (404)," 61 flagged as "soft 404," and a chunk of others sitting in "crawled, currently not indexed."&lt;/p&gt;

&lt;p&gt;I'm a data analyst. This is my side project, not my day job. So I had a weekend, a coffee subscription, and the GSC export. Here's the four-phase recovery, what each phase actually fixed, and the things I'd do differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the damage actually was
&lt;/h2&gt;

&lt;p&gt;GSC's URL Inspection tool is the only way to figure out what Google thinks of any specific URL. The 87 hard 404s broke down into two groups when I sampled them:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Stale city slugs.&lt;/strong&gt; URLs like &lt;code&gt;/austin&lt;/code&gt; or &lt;code&gt;/dallas&lt;/code&gt; that I had in my sitemap historically but removed when I migrated to disambiguated slugs (&lt;code&gt;/austin-tx&lt;/code&gt;, &lt;code&gt;/dallas-tx&lt;/code&gt;). Google still had the old slugs cached and was hitting 404s when it tried to refresh.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Typo URLs in third-party referrers.&lt;/strong&gt; A handful of blog posts I'd written had link typos to my own site. Google followed those, hit 404s, and now thought those URLs were canonical.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The 61 soft 404s were a different beast. These were real ZIP code pages that returned 200 OK but Google decided didn't have enough content to be "real" pages. Looking at the SSR output, I could see why: when a ZIP wasn't cached in Redis (which was most of the long tail), the page rendered a hero, a search box, and a thin "search results loading" placeholder. About 640 visible words in the SSR HTML. From Google's perspective: this is a glorified 404 wearing a 200 costume.&lt;/p&gt;

&lt;p&gt;The "crawled, currently not indexed" pile was the soft-404 pile's pre-stage. Google had crawled them, decided they weren't worth indexing, but hadn't formally classified them as soft 404s yet.&lt;/p&gt;

&lt;p&gt;That's the diagnosis. Now the fixes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 1: The boring redirect fix
&lt;/h2&gt;

&lt;p&gt;This phase is unglamorous and important. For each of the 87 hard 404s, I had to decide: does this URL have a clear successor, and if so, where do I redirect it?&lt;/p&gt;

&lt;p&gt;I did this in &lt;code&gt;next.config.mjs&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;STALE_CITY_REDIRECTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/austin&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/austin-tx&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;permanent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/dallas&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/dallas-tx&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;permanent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="c1"&gt;// ...64 more&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;redirects&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;STALE_CITY_REDIRECTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;permanent: true&lt;/code&gt; emits a 308. That tells Google "this URL has moved permanently, transfer signals to the destination." A 301 would also work; I went with 308 because Next.js's defaults align with that and I didn't want to fight the framework.&lt;/p&gt;

&lt;p&gt;The unglamorous part: most of the 87 URLs didn't have an obvious destination. Some pointed to cities I no longer covered. Some pointed to ZIPs that didn't exist. For those, I redirected to the parent state page (e.g. &lt;code&gt;/austin-foo&lt;/code&gt; → &lt;code&gt;/texas&lt;/code&gt;) so the user lands somewhere useful, and the link equity isn't dropped on the floor.&lt;/p&gt;

&lt;p&gt;Total output of Phase 1: 66 redirects, deployed in one commit. Pushed it. Watched GSC for a week.&lt;/p&gt;

&lt;p&gt;Result: about half the hard 404s validated. The other half stuck. Why?&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 1.5: The Cloudflare redirect chain I didn't know I had
&lt;/h2&gt;

&lt;p&gt;When I inspected one of the still-failing URLs in GSC, the URL Inspection tool said "page with redirect" and showed a chain. Google was hitting &lt;code&gt;http://gas-price-check.com/austin&lt;/code&gt;, getting redirected to &lt;code&gt;https://www.gas-price-check.com/austin&lt;/code&gt;, and then redirected again to &lt;code&gt;https://www.gas-price-check.com/austin-tx&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Two hops. Google does not like two hops.&lt;/p&gt;

&lt;p&gt;I had Cloudflare in front of Vercel as a DNS proxy. Cloudflare was handling the http → https and apex → www redirect. Vercel was handling the slug → slug redirect. Each layer worked. Together they made a chain.&lt;/p&gt;

&lt;p&gt;The fix was a single Cloudflare Redirect Rule that does both transforms in one hop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If: hostname matches "gas-price-check.com" AND scheme is http
Then: 308 to https://www.gas-price-check.com/$path
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that rule landed, the chain collapsed from 2 hops to 1. Google reprocessed and started clearing the rest of the hard 404s. Lesson I should have known: when you have multiple layers (CDN, edge, framework), each one defaulting to "I'll handle the redirect" stacks. Audit the hop count with &lt;code&gt;curl -IL &amp;lt;url&amp;gt;&lt;/code&gt; and look for chains.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 2: The real work (thin SSR content)
&lt;/h2&gt;

&lt;p&gt;The 61 soft 404s couldn't be redirected away. These were real pages I wanted indexed. Google just thought they were thin.&lt;/p&gt;

&lt;p&gt;The diagnosis was straightforward once I started reading my own SSR output. When a ZIP wasn't cached, the page rendered:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A hero with the ZIP, city, state&lt;/li&gt;
&lt;li&gt;A search box&lt;/li&gt;
&lt;li&gt;A "loading prices..." placeholder (waiting for client-side fetch)&lt;/li&gt;
&lt;li&gt;A footer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it. Maybe 640 visible words, of which 400 were the footer and global nav. The actual page-specific content was a hero header and a placeholder.&lt;/p&gt;

&lt;p&gt;The fix had three sub-components.&lt;/p&gt;

&lt;h3&gt;
  
  
  2a. Server-render the nearby ZIP grid
&lt;/h3&gt;

&lt;p&gt;I had a helper called &lt;code&gt;getNearbyZips(zip, radius)&lt;/code&gt; that returned ZIP codes within a given mile radius. I'd been using it on the client. I moved it to the server component so the SSR HTML included an actual grid of "nearby ZIP codes" with links.&lt;/p&gt;

&lt;p&gt;This added about 80 words of unique-per-ZIP content (different neighbors for each ZIP). More importantly, it added 8-12 internal links per page, which gave Google more signal about the URL's place in the site graph.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before: client-side, invisible to Google&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nearby&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useNearbyZips&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// After: server-rendered, visible to Google&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nearby&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getNearbyZips&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;section&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;h2&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;Nearby ZIP codes&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;h2&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;ul&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;nearby&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;li&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Link&lt;/span&gt; &lt;span class="na"&gt;href&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;`/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;Link&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;li&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;ul&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;section&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2b. Unconditional save tips
&lt;/h3&gt;

&lt;p&gt;I added a hand-written "How to save on gas in {city}" section that rendered regardless of cache state. About 120 words of static-but-locally-relevant content per city. This is templated, but with enough variable interpolation that no two pages have identical text.&lt;/p&gt;

&lt;h3&gt;
  
  
  2c. State backlink with name fallback
&lt;/h3&gt;

&lt;p&gt;Every ZIP page already had a "Back to {state} state guide" link, but it relied on a state abbreviation lookup that returned &lt;code&gt;null&lt;/code&gt; for some edge cases. So those pages were rendering "Back to undefined state guide" or worse, no link at all. Fixed it with a fallback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stateName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getStateByAbbr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nf"&gt;getStateByName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Small fix, but it meant every ZIP page now had a working internal link to its parent state, which closes a major site-graph gap.&lt;/p&gt;

&lt;p&gt;After Phase 2, my SSR word count went from 640 to about 890 on previously-thin pages. That's the threshold I cared about. Google's "soft 404" verdict is based on relative content depth, not an absolute word count, but more depth is always better than less.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 3: The geocoding gap I didn't know I had
&lt;/h2&gt;

&lt;p&gt;While I was at it, I noticed something weird. Some ZIP pages were rendering with lat/lng of (0, 0). This made the distance calculations on the page nonsensical ("nearest gas station: 8,247 miles away"). It also meant the "nearby ZIPs" grid was showing up empty for those pages.&lt;/p&gt;

&lt;p&gt;The cause: my ZIP-to-lat/lng resolver had a single source: &lt;code&gt;zippopotam.us&lt;/code&gt;. It's free, fast, and most of the time correct. But for some valid US ZIPs (75072 in McKinney TX, for one), it returns a 404.&lt;/p&gt;

&lt;p&gt;I rebuilt the resolver as a 4-tier fallback chain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;zipContent.json&lt;/code&gt; (a static file with 33,620 ZIPs and pre-resolved coords)&lt;/li&gt;
&lt;li&gt;Redis cache (per-request resolved coords)&lt;/li&gt;
&lt;li&gt;zippopotam.us API&lt;/li&gt;
&lt;li&gt;Nominatim API (slower but covers the gaps)&lt;/li&gt;
&lt;li&gt;Placeholder (0, 0) with degraded behavior&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I'll write up the chain in detail in a separate post. The point for this post: when GSC flagged these pages as soft 404s, the broken geocoding was part of the picture even though it wasn't the headline issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 7B: The second sweep
&lt;/h2&gt;

&lt;p&gt;Three days after deploying Phases 1 through 3, I ran the GSC validation again. About 80% of the URLs had cleared. Some hadn't. So I wrote a script (&lt;code&gt;find-redirect-candidates.js&lt;/code&gt;) that programmatically tested every plausible city slug variant against the live site:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;variants&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="s2"&gt;`/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;city&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;`/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;city&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;`/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;city&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;-&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;`/cheap-gas-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;city&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;variants&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`https://www.gas-price-check.com&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This caught 35 more 404s I hadn't found in the GSC export. Stale links in old blog posts I'd written, third-party links from a directory submission I'd forgotten about, typo URLs in my own social media posts. Each one got a redirect.&lt;/p&gt;

&lt;p&gt;Phase 7B added 35 new redirects, bringing the total to 101. I deployed those, and the second GSC validation came back clean.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 8: Per-state context (the depth fix that actually moved the needle)
&lt;/h2&gt;

&lt;p&gt;After all the redirects and SSR enrichment, I still had some pages stuck in "crawled, currently not indexed." Word count was up. Internal links were up. But Google was still skeptical.&lt;/p&gt;

&lt;p&gt;The thing I hadn't done: make the templated content actually different across pages. My "save tips" section was different by city, but my page-level content above the fold was nearly identical. A page about ZIPs in California and a page about ZIPs in Maine had no state-specific context.&lt;/p&gt;

&lt;p&gt;I built a &lt;code&gt;stateContext.ts&lt;/code&gt; module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;STATE_CONTEXT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;CA&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;California's gas prices are shaped by the state's unique CARB...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;TX&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Texas typically has some of the lowest gas prices in the country...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;// ...15 hand-written for top-traffic states&lt;/span&gt;
  &lt;span class="c1"&gt;// ...35 generated from a parameterized template for the rest&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getStateContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;STATE_CONTEXT&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nf"&gt;defaultContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 15 hand-written paragraphs are 80 to 120 words each. They explain the state's gas tax, refinery capacity, regulatory regime, and seasonal pricing patterns. These are the things you'd say to a friend if they asked "why are gas prices weird in California?"&lt;/p&gt;

&lt;p&gt;The other 35 states get a templated paragraph with state-specific variables (avg price, neighboring states, gas tax rate). Templated, but with enough variation that each is unique.&lt;/p&gt;

&lt;p&gt;After deploying Phase 8, the previously-stuck pages started getting indexed within 10 days. Not all of them. But enough that I stopped worrying about them.&lt;/p&gt;

&lt;p&gt;Final SSR word count on a representative ZIP page (77386, Spring TX) when measured post-deploy: 810 visible words, up from 635 before Phase 8 alone. The whole journey took it from 640 to 810: a +170 word lift made of mostly per-state context plus the SSR enrichment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The diagnostic that lied to me
&lt;/h2&gt;

&lt;p&gt;One quick aside that became its own post. While diagnosing the soft 404s, I wrote a script to grep the SSR output for content markers ("does this page have an EIA average price rendered?"). The script reported zero matches across 14 URLs. I spent an hour debugging the data layer before realizing the script was broken.&lt;/p&gt;

&lt;p&gt;The cause: React inserts HTML comments (&lt;code&gt;&amp;lt;!-- --&amp;gt;&lt;/code&gt;) between adjacent text expressions during SSR. My regex was using &lt;code&gt;[^&amp;lt;]+&lt;/code&gt; which fails immediately when the next character is &lt;code&gt;&amp;lt;&lt;/code&gt;. The data was rendering correctly the whole time. My detector was the bug.&lt;/p&gt;

&lt;p&gt;I wrote that one up separately. The summary: strip HTML comments before any content matching on Next.js SSR output.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;Before:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;87 hard 404s in GSC&lt;/li&gt;
&lt;li&gt;61 soft 404s in GSC&lt;/li&gt;
&lt;li&gt;~640 SSR words on cached-miss ZIP pages&lt;/li&gt;
&lt;li&gt;1 source of truth for ZIP geocoding (and gaps)&lt;/li&gt;
&lt;li&gt;2-hop redirect chain from &lt;a href="http://apex/*" rel="noopener noreferrer"&gt;http://apex/*&lt;/a&gt; to &lt;a href="https://www/*" rel="noopener noreferrer"&gt;https://www/*&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;0 hard 404s (101 redirects in &lt;code&gt;next.config.mjs&lt;/code&gt;, 1 Cloudflare Redirect Rule)&lt;/li&gt;
&lt;li&gt;0 soft 404s after Phase 8 (per latest GSC validation)&lt;/li&gt;
&lt;li&gt;~810 SSR words on the same pages&lt;/li&gt;
&lt;li&gt;4-tier geocoding chain with 100% coverage of 33,620 ZIPs&lt;/li&gt;
&lt;li&gt;1-hop redirect&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total commits: 7. Total deploy waves: 3 (April 17, April 23, April 27). Total weekend hours: I stopped counting somewhere around 14.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things I'd do differently
&lt;/h2&gt;

&lt;p&gt;If I were starting this side project again:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Set up GSC URL Inspection alerts before launch.&lt;/strong&gt; I had GSC connected but wasn't watching it daily. The 404 problem accumulated for weeks before I noticed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a &lt;code&gt;curl -IL&lt;/code&gt; check to my deploy script.&lt;/strong&gt; A redirect chain check would have caught the http+https+slug double-hop before it became a Google problem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server-render the unique parts of every templated page from day one.&lt;/strong&gt; Anything that varies per page should be in the SSR HTML, not the client bundle. Loading states are the enemy of programmatic SEO.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hand-write the top 10-20% of templated content.&lt;/strong&gt; The remaining 80% can be generated, but the long-tail-of-the-long-tail is where Google will smell templating and dock you. Hand-writing the highest-traffic variants is high leverage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep a diff log of redirect rules.&lt;/strong&gt; Mine grew to 101 entries in &lt;code&gt;next.config.mjs&lt;/code&gt; and I'd already lost track of which ones I added when. A separate JSON file with timestamps would have been smarter.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What I learned about Google's verdict mechanism
&lt;/h2&gt;

&lt;p&gt;Three things I genuinely didn't know before this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Soft 404 is sticky.&lt;/strong&gt; Once Google decides a page is a soft 404, fixing the page doesn't immediately clear the verdict. You have to ask GSC to re-validate, wait 1-4 weeks for re-crawl, and accept that some of those URLs will not come back even after you fix them. The verdict has memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Crawled, currently not indexed" is the bench.&lt;/strong&gt; Google has a finite indexation budget per site. Pages they don't think are worth indexing go on the bench. You can move pages off the bench by improving them, but it's not automatic and it's not fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Internal redirect hops add up.&lt;/strong&gt; Each hop is a small signal-loss for Google. If your CDN does one redirect and your framework does another, you're below capacity even if both are individually correct. Audit your hop count.&lt;/p&gt;

&lt;p&gt;If you've been through a similar programmatic-SEO recovery, I'd love to hear what your phase breakdown looked like. Mine was four phases over two weekends. The longest phase by far wasn't the bulk redirects (that was a few hours). It was Phase 8: the part where I had to admit my templated content was actually pretty thin and rewrite the per-state context by hand.&lt;/p&gt;

&lt;p&gt;Templating gets you to 33,620 pages fast. Earning the right to keep those pages indexed takes longer.&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>seo</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Why your `[^&lt;]+` regex is silently breaking on React SSR output</title>
      <dc:creator>GasPriceCheck</dc:creator>
      <pubDate>Thu, 30 Apr 2026 06:02:34 +0000</pubDate>
      <link>https://forem.com/gaspricecheck/why-your-regex-is-silently-breaking-on-react-ssr-output-2h</link>
      <guid>https://forem.com/gaspricecheck/why-your-regex-is-silently-breaking-on-react-ssr-output-2h</guid>
      <description>&lt;p&gt;Picture this. You've shipped a programmatic SEO site, a few thousand pages of templated content. Google flags 14 URLs as soft 404s. You write a quick diagnostic: hit each URL, fetch the SSR HTML, check for a few content markers (a price string, a state average, a section header). Confirm what's really rendering, fix what's missing, move on.&lt;/p&gt;

&lt;p&gt;That was the plan. Forty minutes in, my script told me 0 of 14 pages had any EIA price data rendered. None. I was about to dig into the data fetching layer when something nagged at me. I curled one of the URLs by hand. The price was right there in the HTML. Plain text, easy to spot.&lt;/p&gt;

&lt;p&gt;The script was lying. The regex was lying.&lt;/p&gt;

&lt;p&gt;It took me longer than I want to admit to figure out what was happening, so here's the writeup so you don't burn the same hour.&lt;/p&gt;

&lt;h2&gt;
  
  
  The regex that "worked"
&lt;/h2&gt;

&lt;p&gt;The diagnostic was straightforward. For each ZIP code page, I wanted to detect whether the EIA state average price was rendered in the SSR HTML. The component looked roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;p&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  Texas average: $&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;eiaAverage&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; per gallon as of &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;eiaDate&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;.
&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;p&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So I grepped the SSR output for the marker pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/Texas average: &lt;/span&gt;&lt;span class="se"&gt;\$([^&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt; per gallon/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The capture group uses &lt;code&gt;[^&amp;lt;]+&lt;/code&gt; to pick up "everything until the next tag." Standard pattern. I've used variations of this in dozens of throwaway scrapers.&lt;/p&gt;

&lt;p&gt;Across all 14 URLs, this regex returned no match. Not "match with empty capture." No match at all.&lt;/p&gt;

&lt;p&gt;Meanwhile, when I curled the same URL and read the response, the rendered text was right there:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Texas average: $2.97 per gallon as of 4/22/2026.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;How does a regex miss text that's literally in the string?&lt;/p&gt;

&lt;h2&gt;
  
  
  What React is actually putting in your HTML
&lt;/h2&gt;

&lt;p&gt;Here's the thing nobody told me about server-rendered React. When you have multiple adjacent text expressions inside a single element, React inserts an HTML comment as a hydration boundary marker. The actual SSR output for that paragraph looked like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;p&amp;gt;&lt;/span&gt;Texas average: $&lt;span class="c"&gt;&amp;lt;!-- --&amp;gt;&lt;/span&gt;2.97&lt;span class="c"&gt;&amp;lt;!-- --&amp;gt;&lt;/span&gt; per gallon as of &lt;span class="c"&gt;&amp;lt;!-- --&amp;gt;&lt;/span&gt;4/22/2026&lt;span class="c"&gt;&amp;lt;!-- --&amp;gt;&lt;/span&gt;.&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each &lt;code&gt;{expression}&lt;/code&gt; interpolation gets bracketed by &lt;code&gt;&amp;lt;!-- --&amp;gt;&lt;/code&gt;. This is intentional. React uses these markers during hydration to know where one text node ends and the next one begins. Without them, React can't reconcile the SSR text with what its virtual DOM says should be there, because adjacent text nodes get merged in the browser.&lt;/p&gt;

&lt;p&gt;So when my regex hits &lt;code&gt;Texas average: $&lt;/code&gt;, the next character is &lt;code&gt;&amp;lt;&lt;/code&gt; (start of the comment). The &lt;code&gt;[^&amp;lt;]+&lt;/code&gt; capture group requires at least one non-&lt;code&gt;&amp;lt;&lt;/code&gt; character. It fails immediately. The regex moves on, finds no other "Texas average:" anchor in the page, and reports no match.&lt;/p&gt;

&lt;p&gt;The price ($2.97) is in the HTML. It's just not adjacent to the anchor text. There's a comment node between them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;

&lt;p&gt;Strip the comments before any content matching. Two lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stripped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&amp;lt;!--&lt;/span&gt;&lt;span class="se"&gt;[\s\S]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;?&lt;/span&gt;&lt;span class="sr"&gt;--&amp;gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;stripped&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/Texas average: &lt;/span&gt;&lt;span class="se"&gt;\$([^&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt; per gallon/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;[\s\S]&lt;/code&gt; rather than &lt;code&gt;.&lt;/code&gt; because comments can span multiple lines and &lt;code&gt;.&lt;/code&gt; in JavaScript doesn't match newlines without the &lt;code&gt;s&lt;/code&gt; flag (and &lt;code&gt;s&lt;/code&gt; flag support is patchy enough that I just default to &lt;code&gt;[\s\S]&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Once I added that one line, the regex worked on all 14 URLs. The EIA data was rendering correctly the whole time. My "soft 404 root cause" was a bug in my diagnostic, not a real content gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this gotcha matters more than you think
&lt;/h2&gt;

&lt;p&gt;The reason I'm writing this up: the failure mode is silent and high-impact.&lt;/p&gt;

&lt;p&gt;If your regex returned an obvious garbage value, you'd debug it in five minutes. But mine returned no match at all. That's the same return value as "this content is genuinely missing." Which is the exact thing I was trying to detect. The diagnostic was indistinguishable from the bug it was looking for.&lt;/p&gt;

&lt;p&gt;I trusted the output. I started forming hypotheses based on the output. "The EIA fetch must be failing on the server. Let me check my fallback chain. Let me check Redis." I burned an hour on those wrong paths because the symptom was confirmed by my (broken) detector.&lt;/p&gt;

&lt;p&gt;Two takeaways that generalize beyond this exact case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, when you grep server-rendered React HTML, always strip comments first.&lt;/strong&gt; Any regex with &lt;code&gt;[^&amp;lt;]+&lt;/code&gt;, &lt;code&gt;\w+&lt;/code&gt;, or word-boundary anchors will trip on the hydration markers. Some browser View Source viewers hide comments by default, which is part of why this gotcha is invisible. View the raw response with &lt;code&gt;curl -s URL | head -200&lt;/code&gt; and look for the &lt;code&gt;&amp;lt;!-- --&amp;gt;&lt;/code&gt; pattern. You'll see them everywhere, especially on text-heavy pages with lots of variable interpolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, validate your diagnostic before you trust its output.&lt;/strong&gt; Run it against a known-good page. If your "missing content" detector reports content as missing on a page where you can visually confirm the content exists, your detector is broken, not the content. I should have done this sanity check before chasing fix hypotheses. I didn't, because the detector was "obviously simple."&lt;/p&gt;

&lt;p&gt;This bit me on a Next.js 15 / React 18 project. I checked: the same hydration-comment behavior is documented in the Next.js docs as part of how React handles hydration. It's not going away. If you're parsing SSR HTML programmatically, assume comments are everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  One last thing worth knowing
&lt;/h2&gt;

&lt;p&gt;There's a sister gotcha. If you're using &lt;code&gt;cheerio&lt;/code&gt; or another DOM parser instead of regex, you don't have this problem. The DOM parser walks the tree, and adjacent text nodes are joined when you call &lt;code&gt;.text()&lt;/code&gt;. So &lt;code&gt;$('p').text()&lt;/code&gt; returns "Texas average: $2.97 per gallon as of 4/22/2026." with no comment artifacts.&lt;/p&gt;

&lt;p&gt;But if you're using regex (faster, simpler for one-off scripts), strip first, match second. Or write a one-line helper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stripComments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&amp;lt;!--&lt;/span&gt;&lt;span class="se"&gt;[\s\S]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;?&lt;/span&gt;&lt;span class="sr"&gt;--&amp;gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop it at the top of every SSR-parsing script you write. Future-you will thank you.&lt;/p&gt;

&lt;p&gt;That's the whole post. One bug, one fix, one hour I'm not getting back. Hope I saved you the same hour.&lt;/p&gt;

&lt;p&gt;If you've hit a similar silent-failure-mode debugging story, drop it in the comments. I collect these.&lt;/p&gt;

</description>
      <category>react</category>
      <category>nextjs</category>
      <category>javascript</category>
      <category>debugging</category>
    </item>
  </channel>
</rss>
