<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Yuki Nakazawa</title>
    <description>The latest articles on Forem by Yuki Nakazawa (@yuki0510).</description>
    <link>https://forem.com/yuki0510</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3812207%2Fb11b59c5-9afe-4d02-9258-3710f6e71b17.jpg</url>
      <title>Forem: Yuki Nakazawa</title>
      <link>https://forem.com/yuki0510</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/yuki0510"/>
    <language>en</language>
    <item>
      <title>Logging Googlebot Crawls for Free with Cloudflare Workers + D1</title>
      <dc:creator>Yuki Nakazawa</dc:creator>
      <pubDate>Fri, 20 Mar 2026 07:41:30 +0000</pubDate>
      <link>https://forem.com/yuki0510/logging-googlebot-crawls-for-free-with-cloudflare-workers-d1-35d3</link>
      <guid>https://forem.com/yuki0510/logging-googlebot-crawls-for-free-with-cloudflare-workers-d1-35d3</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When doing SEO work, there are times when you need to investigate whether Googlebot is properly crawling your pages.&lt;/p&gt;

&lt;p&gt;Google Search Console has a crawl stats feature, but the sample URLs it surfaces are limited to 1,000 entries. For tracking the crawl status of specific pages over time, it falls a bit short.&lt;/p&gt;

&lt;p&gt;Server access logs are the ideal solution for this kind of investigation.&lt;/p&gt;

&lt;p&gt;I use this setup on &lt;a href="https://leaprows.com/en" rel="noopener noreferrer"&gt;LeapRows&lt;/a&gt;, a browser-based CSV tool I built on Vercel.&lt;/p&gt;

&lt;p&gt;On a self-managed VPS or on-premise server, Googlebot access is automatically recorded in Nginx or Apache logs.&lt;/p&gt;

&lt;p&gt;However, with serverless PaaS platforms like Vercel, there's no server management interface — which means no direct access to access logs.&lt;/p&gt;

&lt;p&gt;This is where &lt;strong&gt;Cloudflare&lt;/strong&gt; comes in. By routing your domain's DNS through Cloudflare, you can intercept requests with a Cloudflare Worker before they ever reach Vercel.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Standard Vercel setup]
Googlebot → Vercel → Response (no logs)

[With Cloudflare]
Googlebot → Cloudflare Worker (logs recorded here) → Vercel → Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By saving the logs captured by the Worker into Cloudflare's D1 (a SQLite-based database), you can collect Googlebot crawl logs without touching the Vercel side at all — and it runs entirely within the free tier.&lt;/p&gt;

&lt;p&gt;This article walks through the setup step by step.&lt;/p&gt;

&lt;h3&gt;
  
  
  What you can collect
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Crawl timing per URL (when each page was crawled)&lt;/li&gt;
&lt;li&gt;Status code monitoring (detecting 4xx/5xx crawl errors)&lt;/li&gt;
&lt;li&gt;Cache hit rate (&lt;code&gt;DYNAMIC&lt;/code&gt; vs &lt;code&gt;HIT&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Bot type breakdown (InspectionTool vs Googlebot)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Your domain is managed through Cloudflare&lt;/li&gt;
&lt;li&gt;Node.js and the Wrangler CLI are available&lt;/li&gt;
&lt;li&gt;Estimated time: ~30 minutes&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh40upby64jbfqdzn68mr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh40upby64jbfqdzn68mr.png" alt="Architecture diagram showing Googlebot sending a request to a Cloudflare Worker, which intercepts and detects the bot, forwards the request to Vercel, and asynchronously saves the crawl log to a D1 (SQLite) database using ctx.waitUntil." width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Worker intercepts every incoming request and writes crawl data to D1.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ctx.waitUntil&lt;/code&gt; is used to handle log saving asynchronously, so the response to Googlebot is never delayed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 0: Install the Wrangler CLI
&lt;/h2&gt;

&lt;p&gt;Install the Wrangler CLI to manage Cloudflare from your terminal. Once installed, log in to your account.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; wrangler
wrangler login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 1: Create the D1 Database
&lt;/h2&gt;

&lt;p&gt;Create a D1 database on Cloudflare.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wrangler d1 create googlebot-logs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output will include a &lt;code&gt;database_id&lt;/code&gt; — make a note of it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;✅ Successfully created DB &lt;span class="s1"&gt;'googlebot-logs'&lt;/span&gt;
database_id &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"&lt;/span&gt;  ← copy this
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, create the table definition file and apply it to D1.&lt;/p&gt;

&lt;p&gt;Note: without the &lt;code&gt;--remote&lt;/code&gt; flag, the command runs against your local D1 instance instead of the remote one — don't forget it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create schema.sql&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; schema.sql &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;'
CREATE TABLE IF NOT EXISTS crawl_logs (
  id             INTEGER PRIMARY KEY AUTOINCREMENT,
  ts             TEXT NOT NULL,
  url            TEXT NOT NULL,
  method         TEXT,
  status         INTEGER,
  ua             TEXT,
  ip             TEXT,
  country        TEXT,
  cache          TEXT,
  referer        TEXT,
  bot_type       TEXT,
  content_length INTEGER
);

CREATE INDEX IF NOT EXISTS idx_ts  ON crawl_logs(ts);
CREATE INDEX IF NOT EXISTS idx_url ON crawl_logs(url);
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Apply to D1&lt;/span&gt;
wrangler d1 execute googlebot-logs &lt;span class="nt"&gt;--file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;schema.sql &lt;span class="nt"&gt;--remote&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2: Create the Worker
&lt;/h2&gt;

&lt;p&gt;Create a Worker project locally.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;googlebot-logger &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;googlebot-logger
npm init &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create &lt;code&gt;wrangler.toml&lt;/code&gt; with the following content.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"googlebot-logger"&lt;/span&gt;
&lt;span class="py"&gt;main&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"src/index.js"&lt;/span&gt;
&lt;span class="py"&gt;compatibility_date&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"2024-01-01"&lt;/span&gt;

&lt;span class="c"&gt;# Domain configuration&lt;/span&gt;
&lt;span class="nn"&gt;[[routes]]&lt;/span&gt;
&lt;span class="py"&gt;pattern&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"yourdomain.com/*"&lt;/span&gt;  &lt;span class="c"&gt;# enter your domain&lt;/span&gt;
&lt;span class="py"&gt;zone_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"yourdomain.com"&lt;/span&gt;  &lt;span class="c"&gt;# enter your domain&lt;/span&gt;

&lt;span class="c"&gt;# D1 binding&lt;/span&gt;
&lt;span class="nn"&gt;[[d1_databases]]&lt;/span&gt;
&lt;span class="py"&gt;binding&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"DB"&lt;/span&gt;
&lt;span class="py"&gt;database_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"googlebot-logs"&lt;/span&gt;
&lt;span class="py"&gt;database_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"&lt;/span&gt;  &lt;span class="c"&gt;# ID from Step 1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, create &lt;code&gt;src/index.js&lt;/code&gt;. Since we only want to track page-level crawls, static resource files under &lt;code&gt;/_next/&lt;/code&gt; (JS, CSS, etc.) are excluded from logging.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// 1. Forward the request to the origin first&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// 2. Check the User-Agent&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ua&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;User-Agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;botType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detectGoogleBot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ua&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// 3. If Googlebot, save the log asynchronously without delaying the response&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;botType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// clone before returning&lt;/span&gt;
      &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitUntil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;saveLog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;logResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ua&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;botType&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Identify the type of Googlebot&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;detectGoogleBot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ua&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/Googlebot-Image/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ua&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;googlebot-image&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/Googlebot-Video/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ua&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;googlebot-video&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/Googlebot-News/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ua&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;googlebot-news&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/AdsBot-Google/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ua&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;         &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;adsbot&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/Google-InspectionTool/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ua&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;inspection-tool&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/Googlebot/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ua&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;             &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;googlebot&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// not Googlebot&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Save log to D1&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;saveLog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ua&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;botType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cf&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;

  &lt;span class="c1"&gt;// Exclude static resource files — page URLs only&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/_next/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
    &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/_vercel/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
    &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/static/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
    &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\.(&lt;/span&gt;&lt;span class="sr"&gt;js|css|ico|png|jpg|jpeg|svg|webp|woff|woff2|map|wasm&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;$/&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// If Content-Length is absent, read the body to measure size&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;contentLength&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Length&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;contentLength&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cloned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;cloned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;contentLength&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;byteLength&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
      INSERT INTO crawl_logs (ts, url, method, status, ua, ip, country, cache, referer, bot_type, content_length)
      VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    `&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;search&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;ua&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;CF-Connecting-IP&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;cf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;country&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;CF-Cache-Status&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Referer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;botType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;contentLength&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Log failures should never affect site availability&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Log save failed:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 3: Cloudflare DNS Configuration
&lt;/h2&gt;

&lt;p&gt;Configure Cloudflare to route traffic through the Worker.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verify SSL/TLS encryption mode
&lt;/h3&gt;

&lt;p&gt;Go to &lt;strong&gt;SSL/TLS → Overview&lt;/strong&gt; in the Cloudflare dashboard and confirm the encryption mode is set to &lt;strong&gt;Full&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Leaving it on &lt;strong&gt;Flexible&lt;/strong&gt; and then enabling the proxy can cause an HTTPS redirect loop that takes your site down — worth checking first.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enable proxy on your DNS record
&lt;/h3&gt;

&lt;p&gt;Go to &lt;strong&gt;DNS → Records&lt;/strong&gt;, find the A record for your domain, and click &lt;strong&gt;Edit&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F82zvmfjst07epj1exxkm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F82zvmfjst07epj1exxkm.png" alt="Cloudflare DNS Records page showing an A record for leaprows.com with Proxy status set to " width="800" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Enable the &lt;strong&gt;Proxy status&lt;/strong&gt; toggle and save. The icon will turn into an orange cloud, which means requests will now flow through the Worker.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fri7ak9t0xam0z51p8lre.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fri7ak9t0xam0z51p8lre.png" alt="Cloudflare DNS record edit form showing the Proxy status toggle being switched to " width="800" height="356"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Deploy
&lt;/h2&gt;

&lt;p&gt;With Cloudflare configured, deploy the Worker from your local project.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wrangler deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's everything needed to start collecting logs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: Verify
&lt;/h2&gt;

&lt;p&gt;To confirm logs are being recorded, run a live test from &lt;strong&gt;Google Search Console → URL Inspection → Test Live URL&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt4ouoyxecmicf7gqn01.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt4ouoyxecmicf7gqn01.png" alt="Google Search Console URL Inspection tool showing " width="800" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Search Console's live test uses the &lt;a href="https://developers.google.com/crawling/docs/crawlers-fetchers/google-common-crawlers?hl=en#google-inspectiontool" rel="noopener noreferrer"&gt;&lt;code&gt;Google-InspectionTool&lt;/code&gt;&lt;/a&gt; User-Agent, so in our setup it will be recorded with &lt;code&gt;bot_type = inspection-tool&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After the test completes, check D1 with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wrangler d1 execute googlebot-logs &lt;span class="nt"&gt;--remote&lt;/span&gt; &lt;span class="nt"&gt;--command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"SELECT * FROM crawl_logs ORDER BY ts DESC LIMIT 5"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see a row with &lt;code&gt;inspection-tool&lt;/code&gt; in the &lt;code&gt;bot_type&lt;/code&gt; column, everything is working correctly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0a4vxlgbl9ro77mi2lsx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0a4vxlgbl9ro77mi2lsx.png" alt="Terminal output of a wrangler d1 execute command showing crawl log records in D1, including rows with bot_type values of " width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Free Tier
&lt;/h2&gt;

&lt;p&gt;At roughly 500 bytes per record, the 5 GB free tier holds approximately 10 million records. For an indie SaaS or personal site, you're unlikely to come close to the limit.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Free tier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Workers&lt;/td&gt;
&lt;td&gt;100,000 requests / day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D1 rows written&lt;/td&gt;
&lt;td&gt;100,000 rows / day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D1 storage&lt;/td&gt;
&lt;td&gt;5 GB (total across all databases)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you'd like to keep things tidy, you can add a cron job to automatically delete old logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# Append to wrangler.toml&lt;/span&gt;
&lt;span class="nn"&gt;[triggers]&lt;/span&gt;
&lt;span class="py"&gt;crons&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"0 0 * * 0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c"&gt;# runs every Sunday at midnight&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Append to src/index.js&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;scheduled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    DELETE FROM crawl_logs
    WHERE ts &amp;lt; datetime('now', '-90 days')
  `&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// ... existing fetch handler code ...&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="nx"&gt;scheduled&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Serverless PaaS platforms like Vercel don't expose server access logs, but by using Cloudflare as a DNS proxy you can collect Googlebot crawl logs without any changes to your server-side code.&lt;/p&gt;

&lt;p&gt;The D1 free tier is more than generous enough for small to mid-sized sites, making this essentially free to run.&lt;/p&gt;

&lt;p&gt;As a next step, you could join this data with Google Search Console exports to analyze the relationship between crawl frequency and indexing status.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>buildinpublic</category>
      <category>cloudflare</category>
      <category>workers</category>
    </item>
    <item>
      <title>When JS Libraries Fail at 1M Rows: Generating XLSX via DuckDB SQL</title>
      <dc:creator>Yuki Nakazawa</dc:creator>
      <pubDate>Tue, 10 Mar 2026 22:03:41 +0000</pubDate>
      <link>https://forem.com/yuki0510/when-js-libraries-fail-at-1m-rows-generating-xlsx-via-duckdb-sql-2gkk</link>
      <guid>https://forem.com/yuki0510/when-js-libraries-fail-at-1m-rows-generating-xlsx-via-duckdb-sql-2gkk</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I've been building &lt;strong&gt;&lt;a href="https://leaprows.com/en" rel="noopener noreferrer"&gt;LeapRows&lt;/a&gt;&lt;/strong&gt; — a browser-based CSV analysis tool that runs entirely client-side using DuckDB-WASM, with no server involved.&lt;/p&gt;

&lt;p&gt;At some point I needed to add XLSX export. But every existing approach I tried fell apart at any meaningful scale.&lt;/p&gt;

&lt;p&gt;I eventually landed on a solution: &lt;strong&gt;generate Excel-compatible XML directly via DuckDB SQL, then compress it into an XLSX file using JSZip.&lt;/strong&gt; This post covers why I went that route and how it works.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: My day job is SEO, not software engineering — so please forgive any imprecise terminology. 🙏&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Exporting 1 Million Rows to XLSX in the Browser
&lt;/h2&gt;

&lt;h3&gt;
  
  
  DuckDB-WASM's Built-in XLSX Export Was Unreliable
&lt;/h3&gt;

&lt;p&gt;My first attempt was DuckDB-WASM's native XLSX output (the &lt;code&gt;COPY TO&lt;/code&gt; command via the Excel extension). It was unstable — Excel would throw &lt;em&gt;"The file format or file extension is not valid"&lt;/em&gt; on some outputs and refuse to open the file.&lt;/p&gt;

&lt;p&gt;I ruled it out for production use and started looking for alternatives.&lt;/p&gt;

&lt;h3&gt;
  
  
  JS Libraries Ran Out of Memory
&lt;/h3&gt;

&lt;p&gt;Next I tried SheetJS and ExcelJS. Both worked fine on small datasets, but memory usage exploded as file size grew, and the conversion itself became painfully slow.&lt;/p&gt;

&lt;p&gt;Testing with 1 million rows, progress would reach 95% ... and then the browser would freeze for nearly 20 seconds. After the conversion finally finished, the browser stayed sluggish — and sometimes just crashed entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  I Still Wanted 1 Million Rows to Work
&lt;/h3&gt;

&lt;p&gt;LeapRows is built around the promise of handling 1M+ rows quickly. I didn't want to quietly cap the export at some lower limit. If Excel itself supports up to ~1,048,576 rows, I wanted LeapRows to be able to fill every one of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Insight: XLSX Is Just ZIP + XML
&lt;/h2&gt;

&lt;p&gt;Something that might not be obvious: an XLSX file is actually a ZIP archive containing XML files. You can verify this yourself by renaming any &lt;code&gt;.xlsx&lt;/code&gt; file to &lt;code&gt;.zip&lt;/code&gt; — you'll see the contents directly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbsayknhxlb9nhu52tel.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbsayknhxlb9nhu52tel.png" alt="Windows File Explorer showing the contents of a renamed XLSX file as a ZIP archive, &amp;lt;br&amp;gt;
revealing folders: _rels, docProps, xl, and a [Content_Types].xml file." width="636" height="154"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Those XML files contain a lot of styling information (fonts, colors, borders, conditional formatting, etc.), but the actual data lives in just one file: &lt;code&gt;xl/worksheets/sheet1.xml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsoarf15722vo0fuklmok.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsoarf15722vo0fuklmok.png" alt="A code editor displaying the XML content of xl/worksheets/sheet1.xml inside an XLSX file, &amp;lt;br&amp;gt;
showing font styling definitions including Japanese font names like 游ゴシック." width="800" height="508"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since LeapRows only needs to export raw data from a DuckDB table — no styling required — I realized: &lt;em&gt;the only file I need to generate dynamically is &lt;code&gt;sheet1.xml&lt;/code&gt;. The other 4–5 files in the archive can be static strings.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Generating XML Directly with DuckDB SQL
&lt;/h2&gt;

&lt;p&gt;The data is already in a DuckDB table. And DuckDB SQL is extremely fast at string operations — &lt;code&gt;CONCAT&lt;/code&gt;, &lt;code&gt;REPLACE&lt;/code&gt;, and so on.&lt;/p&gt;

&lt;p&gt;That led to the idea: &lt;em&gt;"What if I use a SQL query to output XML cell strings directly, then ZIP them up with JSZip?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DuckDB → JS objects → library → XML → ZIP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DuckDB → XML strings (via SQL) → JSZip → ZIP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Skipping the intermediate JS object conversion means no memory pressure from large object graphs, and no GC pauses that were causing the browser to freeze.&lt;/p&gt;

&lt;p&gt;The approach uses a &lt;code&gt;CASE&lt;/code&gt; expression to handle each column type appropriately — numbers pass through as-is, dates get converted to Excel serial values, and strings get XML-escaped.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the SQL Looks Like
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="s1"&gt;'&amp;lt;row r="'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rn&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;'"&amp;gt;'&lt;/span&gt;
    &lt;span class="c1"&gt;-- String column: XML-escaped, output as inline string&lt;/span&gt;
    &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="nv"&gt;"name"&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
       &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;c r="A'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rn&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;'" t="inlineStr"&amp;gt;&amp;lt;is&amp;gt;&amp;lt;t&amp;gt;'&lt;/span&gt;
         &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&amp;amp;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&amp;amp;amp;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&amp;amp;lt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&amp;amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
         &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;/t&amp;gt;&amp;lt;/is&amp;gt;&amp;lt;/c&amp;gt;'&lt;/span&gt;
       &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;
    &lt;span class="c1"&gt;-- Numeric column: output value directly&lt;/span&gt;
    &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="nv"&gt;"age"&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
       &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;c r="B'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rn&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;'"&amp;gt;&amp;lt;v&amp;gt;'&lt;/span&gt;
         &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"age"&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
         &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;/v&amp;gt;&amp;lt;/c&amp;gt;'&lt;/span&gt;
       &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;
    &lt;span class="c1"&gt;-- Date column: convert to Excel serial value (days since 1899-12-30)&lt;/span&gt;
    &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="nv"&gt;"created_at"&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
       &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;c r="C'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rn&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;'" s="1"&amp;gt;&amp;lt;v&amp;gt;'&lt;/span&gt;
         &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'day'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt; &lt;span class="s1"&gt;'1899-12-30'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
         &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;/v&amp;gt;&amp;lt;/c&amp;gt;'&lt;/span&gt;
       &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;
    &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;/row&amp;gt;'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;xml_row&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;rn&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;my_table&lt;/span&gt;
    &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt; &lt;span class="k"&gt;OFFSET&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each row in the result set becomes one XML element:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;row&lt;/span&gt; &lt;span class="na"&gt;r=&lt;/span&gt;&lt;span class="s"&gt;"2"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;c&lt;/span&gt; &lt;span class="na"&gt;r=&lt;/span&gt;&lt;span class="s"&gt;"A2"&lt;/span&gt; &lt;span class="na"&gt;t=&lt;/span&gt;&lt;span class="s"&gt;"inlineStr"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;is&amp;gt;&amp;lt;t&amp;gt;&lt;/span&gt;John Doe&lt;span class="nt"&gt;&amp;lt;/t&amp;gt;&amp;lt;/is&amp;gt;&amp;lt;/c&amp;gt;&amp;lt;c&lt;/span&gt; &lt;span class="na"&gt;r=&lt;/span&gt;&lt;span class="s"&gt;"B2"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;v&amp;gt;&lt;/span&gt;30&lt;span class="nt"&gt;&amp;lt;/v&amp;gt;&amp;lt;/c&amp;gt;&amp;lt;c&lt;/span&gt; &lt;span class="na"&gt;r=&lt;/span&gt;&lt;span class="s"&gt;"C2"&lt;/span&gt; &lt;span class="na"&gt;s=&lt;/span&gt;&lt;span class="s"&gt;"1"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;v&amp;gt;&lt;/span&gt;45302&lt;span class="nt"&gt;&amp;lt;/v&amp;gt;&amp;lt;/c&amp;gt;&amp;lt;/row&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;row&lt;/span&gt; &lt;span class="na"&gt;r=&lt;/span&gt;&lt;span class="s"&gt;"3"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;c&lt;/span&gt; &lt;span class="na"&gt;r=&lt;/span&gt;&lt;span class="s"&gt;"A3"&lt;/span&gt; &lt;span class="na"&gt;t=&lt;/span&gt;&lt;span class="s"&gt;"inlineStr"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;is&amp;gt;&amp;lt;t&amp;gt;&lt;/span&gt;Jane Smith&lt;span class="nt"&gt;&amp;lt;/t&amp;gt;&amp;lt;/is&amp;gt;&amp;lt;/c&amp;gt;&amp;lt;c&lt;/span&gt; &lt;span class="na"&gt;r=&lt;/span&gt;&lt;span class="s"&gt;"B3"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;v&amp;gt;&lt;/span&gt;25&lt;span class="nt"&gt;&amp;lt;/v&amp;gt;&amp;lt;/c&amp;gt;&amp;lt;c&lt;/span&gt; &lt;span class="na"&gt;r=&lt;/span&gt;&lt;span class="s"&gt;"C3"&lt;/span&gt; &lt;span class="na"&gt;s=&lt;/span&gt;&lt;span class="s"&gt;"1"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;v&amp;gt;&lt;/span&gt;45150&lt;span class="nt"&gt;&amp;lt;/v&amp;gt;&amp;lt;/c&amp;gt;&amp;lt;/row&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combine these rows with a header row and the static XML files, compress with JSZip, and you have a valid XLSX file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;With ExcelJS, exporting 1 million rows took around 20 seconds and consumed over 1GB of memory (Unfortunately I didn't capture a "before" screenshot for comparison).&lt;/p&gt;

&lt;p&gt;After switching to the SQL-based XML approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XML generation: &lt;strong&gt;5.5s&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;ZIP compression: &lt;strong&gt;7.3s&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Memory usage: significantly reduced&lt;/li&gt;
&lt;li&gt;Post-export browser freeze: &lt;strong&gt;gone&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reduction in GC-eligible objects was what solved the post-conversion sluggishness.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rc0f7bt1138gykogd61.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rc0f7bt1138gykogd61.png" alt="Processing time breakdown for 1 million row XLSX export: DuckDB Init 4ms, File Load 1ms, Table Creation 439ms, Schema Analysis 1ms, Preview Generation 4ms, Build XML 5.5s (41%), ZIP Compress 7.3s (55%) ." width="800" height="142"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Side note: compression could probably be moved to the WASM layer for even better performance, but that felt out of scope for now. 😇&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs
&lt;/h2&gt;

&lt;p&gt;This approach doesn't support cell styling — no colors, borders, or Excel-specific formatting. For LeapRows that's fine, since the goal is pure data export. But it's worth keeping in mind if you need rich formatting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Once I let go of the assumption that &lt;em&gt;"XLSX is a complex format,"&lt;/em&gt; it became clear that for plain data export, a minimal XML structure is completely sufficient.&lt;/p&gt;

&lt;p&gt;DuckDB's string processing is fast enough to make "generate XML via SQL" a genuinely practical approach — and I suspect the same idea could be applied in other creative ways beyond XLSX export.&lt;/p&gt;

</description>
      <category>duckdb</category>
      <category>webdev</category>
      <category>beginners</category>
      <category>javascript</category>
    </item>
    <item>
      <title>How a Non-Engineer Built a 1-Million-Row CSV Analyzer with Claude Code and DuckDB-WASM</title>
      <dc:creator>Yuki Nakazawa</dc:creator>
      <pubDate>Mon, 09 Mar 2026 11:45:28 +0000</pubDate>
      <link>https://forem.com/yuki0510/how-a-non-engineer-built-a-1-million-row-csv-analyzer-with-claude-code-and-duckdb-wasm-4mpp</link>
      <guid>https://forem.com/yuki0510/how-a-non-engineer-built-a-1-million-row-csv-analyzer-with-claude-code-and-duckdb-wasm-4mpp</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;I built a tool called &lt;strong&gt;&lt;a href="https://leaprows.com/en" rel="noopener noreferrer"&gt;LeapRows&lt;/a&gt;&lt;/strong&gt; — a browser-based CSV analyzer that handles 1 million rows without breaking a sweat. 🎉&lt;/p&gt;

&lt;p&gt;The key feature is that everything runs entirely inside the browser using DuckDB-WASM and OPFS. Your data never leaves your machine.&lt;/p&gt;

&lt;p&gt;I'm not an engineer. I wrote a tiny bit of code for minor tweaks and debugging, but 95%+ of the codebase was written by Claude Code.&lt;/p&gt;

&lt;p&gt;I can't claim to understand every single line — and that's exactly why I didn't want to just blindly ship AI-generated code. I put quality controls in place as best I could as a non-engineer: defining implementation rules in CLAUDE.md, building security audit Skills based on OWASP Top 10, and setting up pre-commit hooks for lightweight checks.&lt;/p&gt;

&lt;p&gt;It may not be perfect, but I at least wanted to avoid "pray and deploy."&lt;/p&gt;

&lt;p&gt;The tool is still in Beta, but I want to document what it took for a non-engineer to ship a real product in collaboration with Claude Code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Built It: "The Python Sharing Problem" and Server Costs
&lt;/h2&gt;

&lt;p&gt;My day job is SEO. I regularly deal with CSVs containing hundreds of thousands of rows — exports from Ahrefs, Google Search Console, BigQuery, and similar tools.&lt;/p&gt;

&lt;p&gt;For heavy data work, I'd reach for Python (Polars) to transform and aggregate data. But Python has a high barrier to entry: environment setup, code adjustments — it's just not something you can easily hand off to non-engineer teammates.&lt;/p&gt;

&lt;p&gt;Even for myself, I'd often think, &lt;em&gt;"Do I really have to write Python just for this small transformation?"&lt;/em&gt; And then there were frustrating moments like: &lt;em&gt;"Why is the type inference different for the same CSV from the same tool?!"&lt;/em&gt; (causing join errors).&lt;/p&gt;

&lt;p&gt;I'd wanted a tool that made handling hundreds of thousands to millions of rows as easy as using a spreadsheet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pain points:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Excel and Google Sheets struggle badly with CSVs over ~100k rows&lt;/li&gt;
&lt;li&gt;Python is hard to share with non-technical teammates&lt;/li&gt;
&lt;li&gt;Writing Python for small one-off tasks feels like overkill&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  My First Attempt: Running Polars Server-Side (Quickly Abandoned)
&lt;/h2&gt;

&lt;p&gt;My first idea was: &lt;em&gt;"What if I run Polars on the server? It'd be blazing fast for aggregation."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I started building that, but reality hit quickly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uploading and downloading large CSVs was just too slow to be usable&lt;/li&gt;
&lt;li&gt;If user numbers grew, server costs could spiral out of control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I gave up on the server-side Polars approach almost immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Turning Point: DuckDB-WASM × OPFS
&lt;/h2&gt;

&lt;p&gt;Just as I was about to abandon the whole idea, I came across an article by Shiguredo about handling 1TB of log data offline using DuckDB-WASM and OPFS.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://zenn.dev/shiguredo/articles/duckdb-wasm-s3-parquet-opfs" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fres.cloudinary.com%2Fzenn%2Fimage%2Fupload%2Fs--3ehooQVp--%2Fc_fit%252Cg_north_west%252Cl_text%3Anotosansjp-medium.otf_55%3ADuckDB%25252FDuckDB-Wasm%252520%2525E3%252582%252592%2525E5%252588%2525A9%2525E7%252594%2525A8%2525E3%252581%252597%2525E3%252581%25259F%2525E4%2525BD%25258E%2525E3%252582%2525B3%2525E3%252582%2525B9%2525E3%252583%252588%2525E3%252581%2525A7%2525E3%252581%2525AE%2525E5%25258F%2525AF%2525E8%2525A6%252596%2525E5%25258C%252596%252Cw_1010%252Cx_90%252Cy_100%2Fg_south_west%252Cl_text%3Anotosansjp-medium.otf_34%3Avoluntas%252Cx_220%252Cy_108%2Fbo_3px_solid_rgb%3Ad6e3ed%252Cg_south_west%252Ch_90%252Cl_fetch%3AaHR0cHM6Ly9zdG9yYWdlLmdvb2dsZWFwaXMuY29tL3plbm4tdXNlci11cGxvYWQvYXZhdGFyLzAwYzA1YTI0OWUuanBlZw%3D%3D%252Cr_20%252Cw_90%252Cx_92%252Cy_102%2Fco_rgb%3A6e7b85%252Cg_south_west%252Cl_text%3Anotosansjp-medium.otf_30%3A%2525E6%252599%252582%2525E9%25259B%2525A8%2525E5%2525A0%252582%2525E3%252583%25258E%2525E3%252583%2525BC%2525E3%252583%252588%252Cx_220%252Cy_160%2Fbo_4px_solid_white%252Cg_south_west%252Ch_50%252Cl_fetch%3AaHR0cHM6Ly9zdG9yYWdlLmdvb2dsZWFwaXMuY29tL3plbm4tdXNlci11cGxvYWQvYXZhdGFyL2RkMTU4OGE4NzUuanBlZw%3D%3D%252Cr_max%252Cw_50%252Cx_139%252Cy_84%2Fv1627283836%2Fdefault%2Fog-base-w1200-v2.png%3F_a%3DBACAGSGT" height="auto" class="m-0"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://zenn.dev/shiguredo/articles/duckdb-wasm-s3-parquet-opfs" rel="noopener noreferrer" class="c-link"&gt;
            DuckDB/DuckDB-Wasm を利用した低コストでの可視化
          &lt;/a&gt;
        &lt;/h2&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstatic.zenn.studio%2Fimages%2Flogo-transparent.png"&gt;
          zenn.dev
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;




&lt;p&gt;I'm embarrassed to admit this was my first time hearing about the Parquet format, but the query speed shown in their demo blew me away.&lt;/p&gt;

&lt;p&gt;That's when it clicked: &lt;em&gt;"What if I instantly convert uploaded CSVs to Parquet and store them in OPFS? I could build a blazing-fast data processing tool with zero server involvement."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Looking at the documentation, I felt like I could probably write the basic code to load and display data with DuckDB-WASM myself.&lt;/p&gt;

&lt;p&gt;But if I wanted something &lt;em&gt;anyone&lt;/em&gt; could use, it would need a polished GUI — and building a proper GUI while raising two kids was simply not realistic.&lt;/p&gt;

&lt;p&gt;That's when I decided to try Claude Code, which was getting a lot of buzz at the time (this was around June 2025).&lt;/p&gt;

&lt;h2&gt;
  
  
  Is It Really "Zero Network Traffic"?
&lt;/h2&gt;

&lt;p&gt;As a side note — I know some people might be skeptical when a non-engineer claims their tool doesn't send data to a server.&lt;/p&gt;

&lt;p&gt;That's exactly why I was committed to an architecture where data &lt;em&gt;physically cannot leave the browser&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Here's a screenshot of the browser's Network tab while LeapRows processes a large CSV. The only POST request is to Vercel Analytics (page view tracking), and that's only enabled on the landing page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2u5vlpq6ce2mrbg75o37.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2u5vlpq6ce2mrbg75o37.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmspbjxknpr9k2lwds5o7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmspbjxknpr9k2lwds5o7.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even inspecting the payload, you'll see that no file names or data contents are being sent anywhere.&lt;/p&gt;

&lt;p&gt;This gave me the best of both worlds: zero server costs, and a tool users can trust with sensitive data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Battle with a Runaway Claude Code
&lt;/h2&gt;

&lt;p&gt;I installed Claude Code, brimming with excitement, and typed: &lt;em&gt;"Using DuckDB-WASM, build a tool that lets users upload a CSV, convert it to Parquet, store it in OPFS, and view the data."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Code poured out almost instantly. A minimal working tool took shape.&lt;/p&gt;

&lt;p&gt;Energized, I kept adding features: pivot tables, filters, column operations... whatever came to mind.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bugs Multiplied → Time to Start Over
&lt;/h3&gt;

&lt;p&gt;After adding a few features, things started to fall apart. Claude Code fell into a loop: &lt;em&gt;"Fixed it!" → I check → still broken → I report → "Fixed!" → still broken.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The number of back-and-forth exchanges to implement a single feature shot up, and bugs started appearing in parts I hadn't even touched.&lt;/p&gt;

&lt;p&gt;At that point I was using a CLAUDE.md with basic principles I'd picked up from posts on X about improving Claude Code's output. But without a solid spec for the tool and with ad-hoc requests flying in randomly, CLAUDE.md wasn't doing much. Eventually, nothing worked reliably and bugs could appear anywhere. Pure chaos.&lt;/p&gt;

&lt;p&gt;So after about a month of development, &lt;strong&gt;I made a hard call: scrap everything and start over&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This time, I consulted with Gemini to structure the project properly: design philosophy, DuckDB and OPFS connection conventions, shared UI rules. After that, implementation became dramatically smoother.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's in My CLAUDE.md Now
&lt;/h3&gt;

&lt;p&gt;Here's a condensed excerpt of the rules I've accumulated (it's grown long over time):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Development Philosophy
* Incremental progress: small, composable changes
* Learning from existing code: study patterns before implementing
* Pragmatic over dogmatic: adapt to reality
* Clear intent over clever code: always prioritize clarity

# Bug Fix Methodology
* Follow investigate → test → fix order (never guess)
* Write a failing reproduction test before attempting a fix
* Cap fix attempts at 3 iterations; escalate if not resolved
* Audit the full impact scope (grep all usages)
* Enforce immutable patterns

# Skills (Reusable Implementation Guides)
* Rules must call the relevant Skill before writing any code
* Coverage: DuckDB operations, Zustand state management, security audits, E2E tests, UI patterns
* Plan agents must also reference Skills (read SKILL.md and cite it in the plan)

# Architecture Principles
* DuckDB Singleton — centralized connection management, close() is forbidden
* Zustand state — selective subscriptions via useShallow to prevent over-rendering
* SQL escaping — all queries go through a dedicated utility function
* Single source of truth for fileId — managed exclusively in file-context-store
* Query cancellation — AbortController + debounce for logical cancellation
* HTML sanitization — two-layer defense: escapeHtml + DOMPurify
* Input validation — file size limits, ReDoS prevention, regex pattern validation
* API rate limiting — IP-based brute-force protection

# Troubleshooting
* 30+ error patterns with documented resolutions
* Serves as a knowledge base to prevent recurring issues
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Hitting the "1 Million Row Wall"
&lt;/h2&gt;

&lt;p&gt;Even with a cleaner architecture, development was full of challenges.&lt;/p&gt;

&lt;h3&gt;
  
  
  DuckDB Crashes (Multiple Connection Problem)
&lt;/h3&gt;

&lt;p&gt;Since I was new to DuckDB-WASM, I didn't know you can't open multiple connections to a single instance. Claude Code, of course, had no idea either and happily generated code that did exactly that.&lt;/p&gt;

&lt;p&gt;Frequent errors included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Queries running before DuckDB finished initializing&lt;/li&gt;
&lt;li&gt;SQL executing before CSV→Parquet conversion was complete&lt;/li&gt;
&lt;li&gt;New operations launching before previous queries had finished&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For engineers, this is probably basic stuff — but for me, it was here that I first learned what a &lt;strong&gt;Singleton pattern&lt;/strong&gt; is, after consulting with Gemini.&lt;/p&gt;

&lt;p&gt;Once I added DuckDB instance management through Zustand and explicitly documented in CLAUDE.md that all DuckDB connections must use the singleton instance, the error rate dropped dramatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  The &lt;code&gt;_setThrew is not defined&lt;/code&gt; Error Storm
&lt;/h3&gt;

&lt;p&gt;Working with DuckDB-WASM, I ran into &lt;code&gt;_setThrew is not defined&lt;/code&gt; an absurd number of times.&lt;/p&gt;

&lt;p&gt;I had no idea what it meant. Neither Gemini, Claude, nor Google searches gave me a clear answer at first. Eventually I realized it was a WASM-level error, and once I had Claude Code build a mechanism to catch and log those errors to the console, debugging finally became possible.&lt;/p&gt;

&lt;p&gt;Most of the root causes turned out to be the same issues as before: multiple connections, premature initialization, and data consistency mismatches — all pretty fundamental mistakes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Zero Wait Time: "Dynamic CTEs" and the Birth of the Recipe Feature
&lt;/h2&gt;

&lt;p&gt;Early on, every column operation or filter would execute a query and overwrite the Parquet file in OPFS.&lt;/p&gt;

&lt;p&gt;DuckDB-WASM is fast — sub-second writes. But if I imagined users wanting a spreadsheet-like experience, making them wait even one second per action was a poor UX.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; Every action triggered a physical write, creating a noticeable delay.&lt;/p&gt;

&lt;p&gt;Then I had an idea: &lt;em&gt;"What if I stop writing to disk after every action, and instead store the operation history as JSON? Then, right before rendering, chain everything together with SQL CTEs and execute it in one shot."&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Conceptual image of the dynamic CTE built internally&lt;/span&gt;

&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;step1&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;source_data&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'A'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
     &lt;span class="n"&gt;step2&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;step1&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;step2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This eliminated the per-action wait entirely.&lt;/p&gt;

&lt;p&gt;And then I realized: &lt;em&gt;"If the operation history is cleanly stored as JSON, I can save it and replay entire workflows automatically."&lt;/em&gt; That insight became the foundation of LeapRows' &lt;strong&gt;Recipe feature&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wait time only occurs on initial file load&lt;/li&gt;
&lt;li&gt;Entire workflows can be re-executed from saved JSON&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For complex cases (heavy regex, nested calculations), the dynamic CTE itself could get slow. Since I can't do advanced query tuning by reading EXPLAIN output, I implemented a caching layer that physically saves intermediate results to keep the UI responsive.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Ultimate Debugging Weapon: Lots of &lt;code&gt;console.log&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Even with all of this, there were still cases where Claude Code would insist &lt;em&gt;"It's fixed!"&lt;/em&gt; while the error kept looping.&lt;/p&gt;

&lt;p&gt;My final weapon: &lt;strong&gt;flooding the suspicious code with &lt;code&gt;console.log&lt;/code&gt; statements.&lt;/strong&gt; This let me trace things like &lt;em&gt;"The data becomes undefined here"&lt;/em&gt; or &lt;em&gt;"The schema changes between these two lines."&lt;/em&gt; Then I could tell Claude Code specifically: &lt;em&gt;"The behavior here seems wrong because X."&lt;/em&gt; That dramatically improved the odds of getting a correct fix.&lt;/p&gt;

&lt;p&gt;It might sound like vibe-coding, but for a non-engineer, being able to isolate &lt;em&gt;where&lt;/em&gt; the problem is before handing it off to AI turned out to be a genuine winning strategy.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned from Building with Claude Code
&lt;/h2&gt;

&lt;h3&gt;
  
  
  You Can Build as a Parent with Limited Time
&lt;/h3&gt;

&lt;p&gt;Before Claude Code, I'd start a side project, run out of time, drop it, start again months later having forgotten everything... and never actually ship anything. The cycle repeated many times.&lt;/p&gt;

&lt;p&gt;With Claude Code, as long as I have a clear enough mental image of what I want to build and can write it up as a spec, the implementation moves forward on its own.&lt;/p&gt;

&lt;p&gt;Including the rebuild phase, it took 8 months — but getting to Beta with just 90 minutes a day of work was a huge deal for me personally. For years I'd convinced myself that &lt;em&gt;"there's nothing I can do during the parenting years; I just have to accept falling behind."&lt;/em&gt; This project proved that wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  Non-Engineers Still Need to Understand Development Fundamentals
&lt;/h3&gt;

&lt;p&gt;The main reason it took 8 months despite using Claude Code was that I lacked the foundational knowledge engineers take for granted — things like application architecture, design patterns, and the conceptual building blocks behind good software.&lt;/p&gt;

&lt;p&gt;(Singleton patterns, storing operation steps as state, etc.)&lt;/p&gt;

&lt;p&gt;Claude Code is genuinely impressive, and I do believe we're in an era where non-engineers can ship real tools. But having even a basic understanding of architecture and design patterns saves enormous amounts of time and leads to far better outcomes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skills and Hooks Are Worth the Investment
&lt;/h3&gt;

&lt;p&gt;I underutilized Skills early on, but gradually building them out made a real difference:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;build-in-public&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Generate X posts (#BuildInPublic) from git commits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-md-organizer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Prevent CLAUDE.md bloat (move completed specs to docs/)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;documentation-update&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Update docs after implementation changes (explicit trigger only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;duckdb-singleton-safe&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;DuckDB connection operations, &lt;code&gt;_setThrew&lt;/code&gt; error prevention&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;duckdb-sql-standards&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SQL query construction, column name escaping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;e2e-scenario-creator&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;E2E test scenario generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;e2e-test-fixer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Structured E2E failure diagnosis and repair (4 phases, max 3 iterations)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;security-audit-api-security&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API auth, rate limiting, CSRF audit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;security-audit-data-exposure&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Data leakage audit (logs, error messages)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;security-audit-dependency&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Dependency vulnerability audit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;security-audit-headers&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Security header audit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;security-audit-input-validation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Input validation vulnerability detection (ReDoS, file size)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;security-audit-sql-injection&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SQL injection vulnerability detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;security-audit-xss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;XSS vulnerability detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;security-vulnerability-checker&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Full app security audit (OWASP Top 10)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tailwind-ui-patterns&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;UI component creation (buttons, tables, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test-first-bug-fix&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;TDD bug fixing (reproduce → fix → verify loop)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These aren't a substitute for a professional security audit, but they're guardrails that reduce the risk of AI-introduced vulnerabilities — non-engineer style.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thoughts: AI Isn't Magic, But It Does Expand What's Possible
&lt;/h2&gt;

&lt;p&gt;When I started, I wasn't sure how production-ready something built this way could actually be. Claude Code turned out to be more capable than I imagined, and it genuinely expanded what I thought I could build.&lt;/p&gt;

&lt;p&gt;The biggest benefit is the ability to move forward on things I "kind of knew about but couldn't actually do myself." The more clearly you can picture what you want, the better the output.&lt;/p&gt;

&lt;p&gt;I've barely reached Beta — there's still a lot of marketing work ahead. But if any of this resonates with you, I'd love for you to try &lt;a href="https://leaprows.com/en" rel="noopener noreferrer"&gt;LeapRows&lt;/a&gt; and send me feedback. It would genuinely make my day. 🙏&lt;/p&gt;

</description>
      <category>duckdb</category>
      <category>webdev</category>
      <category>claudecode</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
