Forem: Yuki Nakazawa

Logging Googlebot Crawls for Free with Cloudflare Workers + D1

Yuki Nakazawa — Fri, 20 Mar 2026 07:41:30 +0000

Introduction

When doing SEO work, there are times when you need to investigate whether Googlebot is properly crawling your pages.

Google Search Console has a crawl stats feature, but the sample URLs it surfaces are limited to 1,000 entries. For tracking the crawl status of specific pages over time, it falls a bit short.

Server access logs are the ideal solution for this kind of investigation.

I use this setup on LeapRows, a browser-based CSV tool I built on Vercel.

On a self-managed VPS or on-premise server, Googlebot access is automatically recorded in Nginx or Apache logs.

However, with serverless PaaS platforms like Vercel, there's no server management interface — which means no direct access to access logs.

This is where Cloudflare comes in. By routing your domain's DNS through Cloudflare, you can intercept requests with a Cloudflare Worker before they ever reach Vercel.

[Standard Vercel setup]
Googlebot → Vercel → Response (no logs)

[With Cloudflare]
Googlebot → Cloudflare Worker (logs recorded here) → Vercel → Response

By saving the logs captured by the Worker into Cloudflare's D1 (a SQLite-based database), you can collect Googlebot crawl logs without touching the Vercel side at all — and it runs entirely within the free tier.

This article walks through the setup step by step.

What you can collect

Crawl timing per URL (when each page was crawled)
Status code monitoring (detecting 4xx/5xx crawl errors)
Cache hit rate (DYNAMIC vs HIT)
Bot type breakdown (InspectionTool vs Googlebot)

Prerequisites

Your domain is managed through Cloudflare
Node.js and the Wrangler CLI are available
Estimated time: ~30 minutes

Architecture Overview

The Worker intercepts every incoming request and writes crawl data to D1.

ctx.waitUntil is used to handle log saving asynchronously, so the response to Googlebot is never delayed.

Step 0: Install the Wrangler CLI

Install the Wrangler CLI to manage Cloudflare from your terminal. Once installed, log in to your account.

npm install -g wrangler
wrangler login

Step 1: Create the D1 Database

Create a D1 database on Cloudflare.

wrangler d1 create googlebot-logs

The output will include a database_id — make a note of it.

✅ Successfully created DB 'googlebot-logs'
database_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"  ← copy this

Next, create the table definition file and apply it to D1.

Note: without the --remote flag, the command runs against your local D1 instance instead of the remote one — don't forget it.

# Create schema.sql
cat > schema.sql << 'EOF'
CREATE TABLE IF NOT EXISTS crawl_logs (
  id             INTEGER PRIMARY KEY AUTOINCREMENT,
  ts             TEXT NOT NULL,
  url            TEXT NOT NULL,
  method         TEXT,
  status         INTEGER,
  ua             TEXT,
  ip             TEXT,
  country        TEXT,
  cache          TEXT,
  referer        TEXT,
  bot_type       TEXT,
  content_length INTEGER
);

CREATE INDEX IF NOT EXISTS idx_ts  ON crawl_logs(ts);
CREATE INDEX IF NOT EXISTS idx_url ON crawl_logs(url);
EOF

# Apply to D1
wrangler d1 execute googlebot-logs --file=schema.sql --remote

Step 2: Create the Worker

Create a Worker project locally.

mkdir googlebot-logger && cd googlebot-logger
npm init -y

Create wrangler.toml with the following content.

name = "googlebot-logger"
main = "src/index.js"
compatibility_date = "2024-01-01"

# Domain configuration
[[routes]]
pattern = "yourdomain.com/*"  # enter your domain
zone_name = "yourdomain.com"  # enter your domain

# D1 binding
[[d1_databases]]
binding = "DB"
database_name = "googlebot-logs"
database_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"  # ID from Step 1

Next, create src/index.js. Since we only want to track page-level crawls, static resource files under /_next/ (JS, CSS, etc.) are excluded from logging.

export default {
  async fetch(request, env, ctx) {
    // 1. Forward the request to the origin first
    const response = await fetch(request);

    // 2. Check the User-Agent
    const ua = request.headers.get("User-Agent") || "";
    const botType = detectGoogleBot(ua);

    // 3. If Googlebot, save the log asynchronously without delaying the response
    if (botType) {
      const logResponse = response.clone(); // clone before returning
      ctx.waitUntil(saveLog(env.DB, request, logResponse, ua, botType));
    }

    return response;
  },
};

// Identify the type of Googlebot
function detectGoogleBot(ua) {
  if (/Googlebot-Image/i.test(ua))       return "googlebot-image";
  if (/Googlebot-Video/i.test(ua))       return "googlebot-video";
  if (/Googlebot-News/i.test(ua))        return "googlebot-news";
  if (/AdsBot-Google/i.test(ua))         return "adsbot";
  if (/Google-InspectionTool/i.test(ua)) return "inspection-tool";
  if (/Googlebot/i.test(ua))             return "googlebot";
  return null; // not Googlebot
}

// Save log to D1
async function saveLog(db, request, response, ua, botType) {
  const url  = new URL(request.url);
  const path = url.pathname;
  const cf   = request.cf || {};

  // Exclude static resource files — page URLs only
  if (
    path.startsWith('/_next/') ||
    path.startsWith('/_vercel/') ||
    path.startsWith('/static/') ||
    /\.(js|css|ico|png|jpg|jpeg|svg|webp|woff|woff2|map|wasm)$/.test(path)
  ) {
    return;
  }

  // If Content-Length is absent, read the body to measure size
  let contentLength = parseInt(response.headers.get('Content-Length') || '0', 10);
  if (!contentLength) {
    const cloned = response.clone();
    const buf = await cloned.arrayBuffer();
    contentLength = buf.byteLength;
  }

  try {
    await db.prepare(`
      INSERT INTO crawl_logs (ts, url, method, status, ua, ip, country, cache, referer, bot_type, content_length)
      VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    `).bind(
      new Date().toISOString(),
      path + url.search,
      request.method,
      response.status,
      ua,
      request.headers.get('CF-Connecting-IP') || '',
      cf.country || '',
      response.headers.get('CF-Cache-Status') || '',
      request.headers.get('Referer') || '',
      botType,
      contentLength
    ).run();
  } catch (e) {
    // Log failures should never affect site availability
    console.error('Log save failed:', e.message);
  }
}

Step 3: Cloudflare DNS Configuration

Configure Cloudflare to route traffic through the Worker.

Verify SSL/TLS encryption mode

Go to SSL/TLS → Overview in the Cloudflare dashboard and confirm the encryption mode is set to Full.

Leaving it on Flexible and then enabling the proxy can cause an HTTPS redirect loop that takes your site down — worth checking first.

Enable proxy on your DNS record

Go to DNS → Records, find the A record for your domain, and click Edit.

Enable the Proxy status toggle and save. The icon will turn into an orange cloud, which means requests will now flow through the Worker.

Step 4: Deploy

With Cloudflare configured, deploy the Worker from your local project.

wrangler deploy

That's everything needed to start collecting logs.

Step 5: Verify

To confirm logs are being recorded, run a live test from Google Search Console → URL Inspection → Test Live URL.

Search Console's live test uses the Google-InspectionTool User-Agent, so in our setup it will be recorded with bot_type = inspection-tool.

After the test completes, check D1 with the following command:

wrangler d1 execute googlebot-logs --remote --command="SELECT * FROM crawl_logs ORDER BY ts DESC LIMIT 5"

If you see a row with inspection-tool in the bot_type column, everything is working correctly.

Free Tier

At roughly 500 bytes per record, the 5 GB free tier holds approximately 10 million records. For an indie SaaS or personal site, you're unlikely to come close to the limit.

Service	Free tier
Workers	100,000 requests / day
D1 rows written	100,000 rows / day
D1 storage	5 GB (total across all databases)

If you'd like to keep things tidy, you can add a cron job to automatically delete old logs:

# Append to wrangler.toml
[triggers]
crons = ["0 0 * * 0"]  # runs every Sunday at midnight

// Append to src/index.js
async function scheduled(event, env) {
  await env.DB.prepare(`
    DELETE FROM crawl_logs
    WHERE ts < datetime('now', '-90 days')
  `).run();
}

export default {
  async fetch(request, env, ctx) {
    // ... existing fetch handler code ...
  },
  scheduled,
};

Conclusion

Serverless PaaS platforms like Vercel don't expose server access logs, but by using Cloudflare as a DNS proxy you can collect Googlebot crawl logs without any changes to your server-side code.

The D1 free tier is more than generous enough for small to mid-sized sites, making this essentially free to run.

As a next step, you could join this data with Google Search Console exports to analyze the relationship between crawl frequency and indexing status.

When JS Libraries Fail at 1M Rows: Generating XLSX via DuckDB SQL

Yuki Nakazawa — Tue, 10 Mar 2026 22:03:41 +0000

Introduction

I've been building LeapRows — a browser-based CSV analysis tool that runs entirely client-side using DuckDB-WASM, with no server involved.

At some point I needed to add XLSX export. But every existing approach I tried fell apart at any meaningful scale.

I eventually landed on a solution: generate Excel-compatible XML directly via DuckDB SQL, then compress it into an XLSX file using JSZip. This post covers why I went that route and how it works.

Note: My day job is SEO, not software engineering — so please forgive any imprecise terminology. 🙏

The Problem: Exporting 1 Million Rows to XLSX in the Browser

DuckDB-WASM's Built-in XLSX Export Was Unreliable

My first attempt was DuckDB-WASM's native XLSX output (the COPY TO command via the Excel extension). It was unstable — Excel would throw "The file format or file extension is not valid" on some outputs and refuse to open the file.

I ruled it out for production use and started looking for alternatives.

JS Libraries Ran Out of Memory

Next I tried SheetJS and ExcelJS. Both worked fine on small datasets, but memory usage exploded as file size grew, and the conversion itself became painfully slow.

Testing with 1 million rows, progress would reach 95% ... and then the browser would freeze for nearly 20 seconds. After the conversion finally finished, the browser stayed sluggish — and sometimes just crashed entirely.

I Still Wanted 1 Million Rows to Work

LeapRows is built around the promise of handling 1M+ rows quickly. I didn't want to quietly cap the export at some lower limit. If Excel itself supports up to ~1,048,576 rows, I wanted LeapRows to be able to fill every one of them.

The Insight: XLSX Is Just ZIP + XML

Something that might not be obvious: an XLSX file is actually a ZIP archive containing XML files. You can verify this yourself by renaming any .xlsx file to .zip — you'll see the contents directly.

Those XML files contain a lot of styling information (fonts, colors, borders, conditional formatting, etc.), but the actual data lives in just one file: xl/worksheets/sheet1.xml.

Since LeapRows only needs to export raw data from a DuckDB table — no styling required — I realized: the only file I need to generate dynamically is sheet1.xml. The other 4–5 files in the archive can be static strings.

Generating XML Directly with DuckDB SQL

The data is already in a DuckDB table. And DuckDB SQL is extremely fast at string operations — CONCAT, REPLACE, and so on.

That led to the idea: "What if I use a SQL query to output XML cell strings directly, then ZIP them up with JSZip?"

Before:

DuckDB → JS objects → library → XML → ZIP

After:

DuckDB → XML strings (via SQL) → JSZip → ZIP

Skipping the intermediate JS object conversion means no memory pressure from large object graphs, and no GC pauses that were causing the browser to freeze.

The approach uses a CASE expression to handle each column type appropriately — numbers pass through as-is, dates get converted to Excel serial values, and strings get XML-escaped.

What the SQL Looks Like

SELECT
    '<row r="' || CAST(rn + 1 AS VARCHAR) || '">'
    -- String column: XML-escaped, output as inline string
    || CASE WHEN "name" IS NOT NULL
       THEN '<c r="A' || CAST(rn + 1 AS VARCHAR) || '" t="inlineStr"><is><t>'
         || REPLACE(REPLACE(REPLACE("name", '&', '&amp;'), '<', '&lt;'), '>', '&gt;')
         || '</t></is></c>'
       ELSE '' END
    -- Numeric column: output value directly
    || CASE WHEN "age" IS NOT NULL
       THEN '<c r="B' || CAST(rn + 1 AS VARCHAR) || '"><v>'
         || CAST("age" AS VARCHAR)
         || '</v></c>'
       ELSE '' END
    -- Date column: convert to Excel serial value (days since 1899-12-30)
    || CASE WHEN "created_at" IS NOT NULL
       THEN '<c r="C' || CAST(rn + 1 AS VARCHAR) || '" s="1"><v>'
         || CAST(date_diff('day', DATE '1899-12-30', "created_at") AS VARCHAR)
         || '</v></c>'
       ELSE '' END
    || '</row>' AS xml_row
FROM (
    SELECT *, ROW_NUMBER() OVER () AS rn
    FROM my_table
    LIMIT 50000 OFFSET 0
)

Each row in the result set becomes one XML element:

<row r="2"><c r="A2" t="inlineStr"><is><t>John Doe</t></is></c><c r="B2"><v>30</v></c><c r="C2" s="1"><v>45302</v></c></row>
<row r="3"><c r="A3" t="inlineStr"><is><t>Jane Smith</t></is></c><c r="B3"><v>25</v></c><c r="C3" s="1"><v>45150</v></c></row>

Combine these rows with a header row and the static XML files, compress with JSZip, and you have a valid XLSX file.

Results

With ExcelJS, exporting 1 million rows took around 20 seconds and consumed over 1GB of memory (Unfortunately I didn't capture a "before" screenshot for comparison).

After switching to the SQL-based XML approach:

XML generation: 5.5s
ZIP compression: 7.3s
Memory usage: significantly reduced
Post-export browser freeze: gone

The reduction in GC-eligible objects was what solved the post-conversion sluggishness.

Side note: compression could probably be moved to the WASM layer for even better performance, but that felt out of scope for now. 😇

Trade-offs

This approach doesn't support cell styling — no colors, borders, or Excel-specific formatting. For LeapRows that's fine, since the goal is pure data export. But it's worth keeping in mind if you need rich formatting.

Wrapping Up

Once I let go of the assumption that "XLSX is a complex format," it became clear that for plain data export, a minimal XML structure is completely sufficient.

DuckDB's string processing is fast enough to make "generate XML via SQL" a genuinely practical approach — and I suspect the same idea could be applied in other creative ways beyond XLSX export.

How a Non-Engineer Built a 1-Million-Row CSV Analyzer with Claude Code and DuckDB-WASM

Yuki Nakazawa — Mon, 09 Mar 2026 11:45:28 +0000

Introduction

I built a tool called LeapRows — a browser-based CSV analyzer that handles 1 million rows without breaking a sweat. 🎉

The key feature is that everything runs entirely inside the browser using DuckDB-WASM and OPFS. Your data never leaves your machine.

I'm not an engineer. I wrote a tiny bit of code for minor tweaks and debugging, but 95%+ of the codebase was written by Claude Code.

I can't claim to understand every single line — and that's exactly why I didn't want to just blindly ship AI-generated code. I put quality controls in place as best I could as a non-engineer: defining implementation rules in CLAUDE.md, building security audit Skills based on OWASP Top 10, and setting up pre-commit hooks for lightweight checks.

It may not be perfect, but I at least wanted to avoid "pray and deploy."

The tool is still in Beta, but I want to document what it took for a non-engineer to ship a real product in collaboration with Claude Code.

Why I Built It: "The Python Sharing Problem" and Server Costs

My day job is SEO. I regularly deal with CSVs containing hundreds of thousands of rows — exports from Ahrefs, Google Search Console, BigQuery, and similar tools.

For heavy data work, I'd reach for Python (Polars) to transform and aggregate data. But Python has a high barrier to entry: environment setup, code adjustments — it's just not something you can easily hand off to non-engineer teammates.

Even for myself, I'd often think, "Do I really have to write Python just for this small transformation?" And then there were frustrating moments like: "Why is the type inference different for the same CSV from the same tool?!" (causing join errors).

I'd wanted a tool that made handling hundreds of thousands to millions of rows as easy as using a spreadsheet.

The pain points:

Excel and Google Sheets struggle badly with CSVs over ~100k rows
Python is hard to share with non-technical teammates
Writing Python for small one-off tasks feels like overkill

My First Attempt: Running Polars Server-Side (Quickly Abandoned)

My first idea was: "What if I run Polars on the server? It'd be blazing fast for aggregation."

I started building that, but reality hit quickly:

Uploading and downloading large CSVs was just too slow to be usable
If user numbers grew, server costs could spiral out of control

I gave up on the server-side Polars approach almost immediately.

The Turning Point: DuckDB-WASM × OPFS

Just as I was about to abandon the whole idea, I came across an article by Shiguredo about handling 1TB of log data offline using DuckDB-WASM and OPFS.

DuckDB/DuckDB-Wasm を利用した低コストでの可視化

zenn.dev

I'm embarrassed to admit this was my first time hearing about the Parquet format, but the query speed shown in their demo blew me away.

That's when it clicked: "What if I instantly convert uploaded CSVs to Parquet and store them in OPFS? I could build a blazing-fast data processing tool with zero server involvement."

Looking at the documentation, I felt like I could probably write the basic code to load and display data with DuckDB-WASM myself.

But if I wanted something anyone could use, it would need a polished GUI — and building a proper GUI while raising two kids was simply not realistic.

That's when I decided to try Claude Code, which was getting a lot of buzz at the time (this was around June 2025).

Is It Really "Zero Network Traffic"?

As a side note — I know some people might be skeptical when a non-engineer claims their tool doesn't send data to a server.

That's exactly why I was committed to an architecture where data physically cannot leave the browser.

Here's a screenshot of the browser's Network tab while LeapRows processes a large CSV. The only POST request is to Vercel Analytics (page view tracking), and that's only enabled on the landing page.

Even inspecting the payload, you'll see that no file names or data contents are being sent anywhere.

This gave me the best of both worlds: zero server costs, and a tool users can trust with sensitive data.

The Battle with a Runaway Claude Code

I installed Claude Code, brimming with excitement, and typed: "Using DuckDB-WASM, build a tool that lets users upload a CSV, convert it to Parquet, store it in OPFS, and view the data."

Code poured out almost instantly. A minimal working tool took shape.

Energized, I kept adding features: pivot tables, filters, column operations... whatever came to mind.

Bugs Multiplied → Time to Start Over

After adding a few features, things started to fall apart. Claude Code fell into a loop: "Fixed it!" → I check → still broken → I report → "Fixed!" → still broken.

The number of back-and-forth exchanges to implement a single feature shot up, and bugs started appearing in parts I hadn't even touched.

At that point I was using a CLAUDE.md with basic principles I'd picked up from posts on X about improving Claude Code's output. But without a solid spec for the tool and with ad-hoc requests flying in randomly, CLAUDE.md wasn't doing much. Eventually, nothing worked reliably and bugs could appear anywhere. Pure chaos.

So after about a month of development, I made a hard call: scrap everything and start over.

This time, I consulted with Gemini to structure the project properly: design philosophy, DuckDB and OPFS connection conventions, shared UI rules. After that, implementation became dramatically smoother.

What's in My CLAUDE.md Now

Here's a condensed excerpt of the rules I've accumulated (it's grown long over time):

# Development Philosophy
* Incremental progress: small, composable changes
* Learning from existing code: study patterns before implementing
* Pragmatic over dogmatic: adapt to reality
* Clear intent over clever code: always prioritize clarity

# Bug Fix Methodology
* Follow investigate → test → fix order (never guess)
* Write a failing reproduction test before attempting a fix
* Cap fix attempts at 3 iterations; escalate if not resolved
* Audit the full impact scope (grep all usages)
* Enforce immutable patterns

# Skills (Reusable Implementation Guides)
* Rules must call the relevant Skill before writing any code
* Coverage: DuckDB operations, Zustand state management, security audits, E2E tests, UI patterns
* Plan agents must also reference Skills (read SKILL.md and cite it in the plan)

# Architecture Principles
* DuckDB Singleton — centralized connection management, close() is forbidden
* Zustand state — selective subscriptions via useShallow to prevent over-rendering
* SQL escaping — all queries go through a dedicated utility function
* Single source of truth for fileId — managed exclusively in file-context-store
* Query cancellation — AbortController + debounce for logical cancellation
* HTML sanitization — two-layer defense: escapeHtml + DOMPurify
* Input validation — file size limits, ReDoS prevention, regex pattern validation
* API rate limiting — IP-based brute-force protection

# Troubleshooting
* 30+ error patterns with documented resolutions
* Serves as a knowledge base to prevent recurring issues

Hitting the "1 Million Row Wall"

Even with a cleaner architecture, development was full of challenges.

DuckDB Crashes (Multiple Connection Problem)

Since I was new to DuckDB-WASM, I didn't know you can't open multiple connections to a single instance. Claude Code, of course, had no idea either and happily generated code that did exactly that.

Frequent errors included:

Queries running before DuckDB finished initializing
SQL executing before CSV→Parquet conversion was complete
New operations launching before previous queries had finished

For engineers, this is probably basic stuff — but for me, it was here that I first learned what a Singleton pattern is, after consulting with Gemini.

Once I added DuckDB instance management through Zustand and explicitly documented in CLAUDE.md that all DuckDB connections must use the singleton instance, the error rate dropped dramatically.

The `_setThrew is not defined` Error Storm

Working with DuckDB-WASM, I ran into _setThrew is not defined an absurd number of times.

I had no idea what it meant. Neither Gemini, Claude, nor Google searches gave me a clear answer at first. Eventually I realized it was a WASM-level error, and once I had Claude Code build a mechanism to catch and log those errors to the console, debugging finally became possible.

Most of the root causes turned out to be the same issues as before: multiple connections, premature initialization, and data consistency mismatches — all pretty fundamental mistakes.

Zero Wait Time: "Dynamic CTEs" and the Birth of the Recipe Feature

Early on, every column operation or filter would execute a query and overwrite the Parquet file in OPFS.

DuckDB-WASM is fast — sub-second writes. But if I imagined users wanting a spreadsheet-like experience, making them wait even one second per action was a poor UX.

Before: Every action triggered a physical write, creating a noticeable delay.

Then I had an idea: "What if I stop writing to disk after every action, and instead store the operation history as JSON? Then, right before rendering, chain everything together with SQL CTEs and execute it in one shot."

-- Conceptual image of the dynamic CTE built internally

WITH step1 AS (SELECT * FROM source_data WHERE category = 'A'),
     step2 AS (SELECT * FROM step1 WHERE price > 1000)
SELECT * FROM step2;

This eliminated the per-action wait entirely.

And then I realized: "If the operation history is cleanly stored as JSON, I can save it and replay entire workflows automatically." That insight became the foundation of LeapRows' Recipe feature.

After:

Wait time only occurs on initial file load
Entire workflows can be re-executed from saved JSON

For complex cases (heavy regex, nested calculations), the dynamic CTE itself could get slow. Since I can't do advanced query tuning by reading EXPLAIN output, I implemented a caching layer that physically saves intermediate results to keep the UI responsive.

The Ultimate Debugging Weapon: Lots of `console.log`

Even with all of this, there were still cases where Claude Code would insist "It's fixed!" while the error kept looping.

My final weapon: flooding the suspicious code with console.log statements. This let me trace things like "The data becomes undefined here" or "The schema changes between these two lines." Then I could tell Claude Code specifically: "The behavior here seems wrong because X." That dramatically improved the odds of getting a correct fix.

It might sound like vibe-coding, but for a non-engineer, being able to isolate where the problem is before handing it off to AI turned out to be a genuine winning strategy.

What I Learned from Building with Claude Code

You Can Build as a Parent with Limited Time

Before Claude Code, I'd start a side project, run out of time, drop it, start again months later having forgotten everything... and never actually ship anything. The cycle repeated many times.

With Claude Code, as long as I have a clear enough mental image of what I want to build and can write it up as a spec, the implementation moves forward on its own.

Including the rebuild phase, it took 8 months — but getting to Beta with just 90 minutes a day of work was a huge deal for me personally. For years I'd convinced myself that "there's nothing I can do during the parenting years; I just have to accept falling behind." This project proved that wrong.

Non-Engineers Still Need to Understand Development Fundamentals

The main reason it took 8 months despite using Claude Code was that I lacked the foundational knowledge engineers take for granted — things like application architecture, design patterns, and the conceptual building blocks behind good software.

(Singleton patterns, storing operation steps as state, etc.)

Claude Code is genuinely impressive, and I do believe we're in an era where non-engineers can ship real tools. But having even a basic understanding of architecture and design patterns saves enormous amounts of time and leads to far better outcomes.

Skills and Hooks Are Worth the Investment

I underutilized Skills early on, but gradually building them out made a real difference:

Skill	Purpose
`build-in-public`	Generate X posts (#BuildInPublic) from git commits
`claude-md-organizer`	Prevent CLAUDE.md bloat (move completed specs to docs/)
`documentation-update`	Update docs after implementation changes (explicit trigger only)
`duckdb-singleton-safe`	DuckDB connection operations, `_setThrew` error prevention
`duckdb-sql-standards`	SQL query construction, column name escaping
`e2e-scenario-creator`	E2E test scenario generation
`e2e-test-fixer`	Structured E2E failure diagnosis and repair (4 phases, max 3 iterations)
`security-audit-api-security`	API auth, rate limiting, CSRF audit
`security-audit-data-exposure`	Data leakage audit (logs, error messages)
`security-audit-dependency`	Dependency vulnerability audit
`security-audit-headers`	Security header audit
`security-audit-input-validation`	Input validation vulnerability detection (ReDoS, file size)
`security-audit-sql-injection`	SQL injection vulnerability detection
`security-audit-xss`	XSS vulnerability detection
`security-vulnerability-checker`	Full app security audit (OWASP Top 10)
`tailwind-ui-patterns`	UI component creation (buttons, tables, etc.)
`test-first-bug-fix`	TDD bug fixing (reproduce → fix → verify loop)

These aren't a substitute for a professional security audit, but they're guardrails that reduce the risk of AI-introduced vulnerabilities — non-engineer style.

Closing Thoughts: AI Isn't Magic, But It Does Expand What's Possible

When I started, I wasn't sure how production-ready something built this way could actually be. Claude Code turned out to be more capable than I imagined, and it genuinely expanded what I thought I could build.

The biggest benefit is the ability to move forward on things I "kind of knew about but couldn't actually do myself." The more clearly you can picture what you want, the better the output.

I've barely reached Beta — there's still a lot of marketing work ahead. But if any of this resonates with you, I'd love for you to try LeapRows and send me feedback. It would genuinely make my day. 🙏

Forem: Yuki Nakazawa

Logging Googlebot Crawls for Free with Cloudflare Workers + D1

Introduction

What you can collect

Prerequisites

Architecture Overview

Step 0: Install the Wrangler CLI

Step 1: Create the D1 Database

Step 2: Create the Worker

Step 3: Cloudflare DNS Configuration

Verify SSL/TLS encryption mode

Enable proxy on your DNS record

Step 4: Deploy

Step 5: Verify

Free Tier

Conclusion

When JS Libraries Fail at 1M Rows: Generating XLSX via DuckDB SQL

Introduction

The Problem: Exporting 1 Million Rows to XLSX in the Browser

DuckDB-WASM's Built-in XLSX Export Was Unreliable

JS Libraries Ran Out of Memory

I Still Wanted 1 Million Rows to Work

The Insight: XLSX Is Just ZIP + XML

Generating XML Directly with DuckDB SQL

What the SQL Looks Like

Results

Trade-offs

Wrapping Up

How a Non-Engineer Built a 1-Million-Row CSV Analyzer with Claude Code and DuckDB-WASM

Introduction

Why I Built It: "The Python Sharing Problem" and Server Costs

My First Attempt: Running Polars Server-Side (Quickly Abandoned)

The Turning Point: DuckDB-WASM × OPFS

DuckDB/DuckDB-Wasm を利用した低コストでの可視化

Is It Really "Zero Network Traffic"?

The Battle with a Runaway Claude Code

Bugs Multiplied → Time to Start Over

What's in My CLAUDE.md Now

Hitting the "1 Million Row Wall"

DuckDB Crashes (Multiple Connection Problem)

The _setThrew is not defined Error Storm

Zero Wait Time: "Dynamic CTEs" and the Birth of the Recipe Feature

The Ultimate Debugging Weapon: Lots of console.log

What I Learned from Building with Claude Code

You Can Build as a Parent with Limited Time

Non-Engineers Still Need to Understand Development Fundamentals

Skills and Hooks Are Worth the Investment

Closing Thoughts: AI Isn't Magic, But It Does Expand What's Possible

The `_setThrew is not defined` Error Storm

The Ultimate Debugging Weapon: Lots of `console.log`