<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: J.S_Falcon</title>
    <description>The latest articles on Forem by J.S_Falcon (@_d3709cf9e80fc6babbff).</description>
    <link>https://forem.com/_d3709cf9e80fc6babbff</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3898219%2F048d871b-c38b-4948-87a5-cc7602c5b123.webp</url>
      <title>Forem: J.S_Falcon</title>
      <link>https://forem.com/_d3709cf9e80fc6babbff</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/_d3709cf9e80fc6babbff"/>
    <language>en</language>
    <item>
      <title>What Operations Discipline Brings to AI-Assisted Coding: A Cross-Domain Field Guide</title>
      <dc:creator>J.S_Falcon</dc:creator>
      <pubDate>Wed, 29 Apr 2026 13:13:13 +0000</pubDate>
      <link>https://forem.com/_d3709cf9e80fc6babbff/what-operations-discipline-brings-to-ai-assisted-coding-a-cross-domain-field-guide-2067</link>
      <guid>https://forem.com/_d3709cf9e80fc6babbff/what-operations-discipline-brings-to-ai-assisted-coding-a-cross-domain-field-guide-2067</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;I moved from operations / systems engineering into the software side via AI collaboration. Part 1 of this series (&lt;a href="https://dev.to/_d3709cf9e80fc6babbff/beating-250000-mental-comparisons-a-cross-domain-engineers-entity-resolution-case-study-42n4"&gt;the entity resolution case study&lt;/a&gt;) is the build; this is the methodology.&lt;/li&gt;
&lt;li&gt;Five practices and five anti-patterns, filtered through an ops lens — but the lessons generalize.&lt;/li&gt;
&lt;li&gt;Not "AI tips you've heard." Patterns that fall out naturally if you treat AI sessions like config reviews, runbooks, and validation procedures.&lt;/li&gt;
&lt;li&gt;Each piece is paired with a real misstep I made building Part 1.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Each part of this series stands alone.&lt;/strong&gt; Read in any order.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why an "Operations Discipline" Lens
&lt;/h2&gt;

&lt;p&gt;Operations engineers spend their careers internalizing four habits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Plan before you build&lt;/strong&gt; — designs, runbooks, change requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify before you declare done&lt;/strong&gt; — validation procedures, post-change checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document state&lt;/strong&gt; — configs, design docs, postmortems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Suspect numbers&lt;/strong&gt; — every monitoring datapoint hides an artifact.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These habits transfer directly to working with AI coding assistants. The disciplines you learned debugging routers, filing change requests, and reviewing configs are the same ones that prevent AI sessions from sliding off the rails.&lt;/p&gt;

&lt;p&gt;I'm framing this through ops because that's the lens I learned from. Most of these patterns generalize beyond ops — software engineers, data engineers, and SREs will recognize them. The ops version just happens to package them tightly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1: Five Practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Practice 1 — Treat your CLAUDE.md (or system prompt) as a design-spec preamble
&lt;/h3&gt;

&lt;p&gt;In ops, every change procedure has a preamble: prerequisites, scope, rollback steps, validation checks. Same energy in AI work.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; is Claude Code's persistent instruction file. (Other assistants have equivalents — system prompts, custom instructions, etc.) Use it the way you'd use a runbook preamble:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Operating principles&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Always plan before implementing.
&lt;span class="p"&gt;-&lt;/span&gt; Confirm ambiguous instructions before coding.
&lt;span class="p"&gt;-&lt;/span&gt; Always provide a counter-argument when proposing a design.
&lt;span class="p"&gt;-&lt;/span&gt; Never report a metric without showing how it was measured.
&lt;span class="p"&gt;-&lt;/span&gt; Distinguish "should work" from "actually verified to work."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once written, every future session inherits these rules. You stop re-explaining yourself. This is the same template-then-reuse pattern that saves you from rewriting a runbook for every change window.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practice 2 — Demand a Devil's Advocate, every time
&lt;/h3&gt;

&lt;p&gt;Design reviews exist because group-think kills production systems. Force the AI to argue against itself in every proposal.&lt;/p&gt;

&lt;p&gt;Three asks I bake into every meaningful design conversation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;em&gt;What's the worst-case failure mode of this design?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;What use case did you not consider?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Give me three reasons to reject this design.&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Bake this requirement into your &lt;code&gt;CLAUDE.md&lt;/code&gt; and you stop seeing pure agreement. An AI that only agrees with you is a single point of failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practice 3 — Force ambiguous instructions to be confirmed before implementation
&lt;/h3&gt;

&lt;p&gt;In ops requirements gathering, "implement loose specs" is a known disaster pattern. The same is true for AI sessions, where ambiguity gets resolved silently — and usually wrong.&lt;/p&gt;

&lt;p&gt;Real example from Part 1: I said "treat the ID and the display name as a pair, match if either is present." The AI interpreted that as two independent search keys. Half the matcher had to be rebuilt.&lt;/p&gt;

&lt;p&gt;Lesson, written into &lt;code&gt;CLAUDE.md&lt;/code&gt;: &lt;em&gt;if an instruction has two valid readings, ask which one I mean before writing code.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is the same habit as a senior network engineer asking "do you mean inbound or outbound?" before touching the firewall.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practice 4 — Separate "theoretical evaluation" from "real-world evaluation"
&lt;/h3&gt;

&lt;p&gt;Ops engineers know the gap between "the spec says it works" and "I've watched the LED light up." The same gap exists in AI work, and it's wider than you'd think.&lt;/p&gt;

&lt;p&gt;Real example from Part 1: the AI claimed about 99.2% recall based on past-data pattern analysis. I asked for an actual run on the real dataset. The actual recall came back at 55%.&lt;/p&gt;

&lt;p&gt;The lesson is not "the AI lied." The lesson is that &lt;em&gt;pattern-analysis predictions are not the same as a real execution result.&lt;/em&gt; Every claim that sounds like a measurement deserves the question: &lt;em&gt;was this measured, or estimated?&lt;/em&gt; If estimated, label it that way and move on; if measured, show the run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practice 5 — Have the AI write its own verification scripts
&lt;/h3&gt;

&lt;p&gt;If the AI says "this code achieves 99% recall," ask it to write the script that measures that recall. Then run it.&lt;/p&gt;

&lt;p&gt;This converts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A claim → a script.&lt;/li&gt;
&lt;li&gt;A script → an audit trail.&lt;/li&gt;
&lt;li&gt;An audit trail → reproducibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is the same pattern as runbooks: a change procedure and a validation procedure, always paired. The validation script becomes a permanent artifact you can hand to the next person — or to your future self when something regresses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2: Five Anti-Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Anti-Pattern 1 — "Just build me a tool"
&lt;/h3&gt;

&lt;p&gt;The AI equivalent of "fix the network." Without scope, the AI invents one. Worse, it pursues the invented scope confidently, so the wrong direction is pursued aggressively.&lt;/p&gt;

&lt;p&gt;Treat session start like requirements gathering: rough goal, key constraints, what's explicitly out of scope. Five minutes of scoping saves five hours of rework.&lt;/p&gt;

&lt;h3&gt;
  
  
  Anti-Pattern 2 — Trusting headline numbers without verifying composition
&lt;/h3&gt;

&lt;p&gt;"99% recall" sounds great until you discover it was measured on cherry-picked rows, with the test set leaking into training data, on a metric that doesn't reflect the actual user experience.&lt;/p&gt;

&lt;p&gt;Before reporting any number, ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How was this measured?&lt;/li&gt;
&lt;li&gt;On what data?&lt;/li&gt;
&lt;li&gt;Under what conditions?&lt;/li&gt;
&lt;li&gt;With what biases?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the same suspicion you apply to a monitoring dashboard reporting zero alerts: &lt;em&gt;is the agent actually reporting, or is it dead?&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Anti-Pattern 3 — Throwing raw error text at the AI without context
&lt;/h3&gt;

&lt;p&gt;"It doesn't work" → "Why?"&lt;/p&gt;

&lt;p&gt;In ops you'd never debug a router by saying "it's down." You'd attach: configuration, status output, syslog excerpts, behavior of connected devices.&lt;/p&gt;

&lt;p&gt;Same here. The AI cannot infer your environment. Show the command, the actual output, the expected behavior, and the deviation. Treat each interaction like a bug report you'd file with a vendor.&lt;/p&gt;

&lt;h3&gt;
  
  
  Anti-Pattern 4 — Sending business data to an AI without compliance review
&lt;/h3&gt;

&lt;p&gt;Default assumption: any data you put into a prompt may be retained, indexed, or used in training, regardless of what the vendor's marketing copy says.&lt;/p&gt;

&lt;p&gt;The operational habit is straightforward — redact, mask, or synthesize. The same instinct that keeps you from posting customer IPs to Stack Overflow should stop you from pasting customer rows into a prompt.&lt;/p&gt;

&lt;p&gt;(Part 1 covers this pattern in depth as it applied to the entity resolution build. The short version: deterministic logic touches the data; the AI touches only code, design notes, and synthetic samples.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Anti-Pattern 5 — Stopping at "it works"
&lt;/h3&gt;

&lt;p&gt;"The code runs" is not the same as "I understand why it runs."&lt;/p&gt;

&lt;p&gt;The ops version of this is: &lt;em&gt;a configuration that worked once but I can't explain is a future incident.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Make the AI explain why the working solution actually works. If neither of you can defend the design after one cycle of follow-up questions, treat it as a yellow flag — not a green light. Ship explainable code; the unexplained kind owns you on the day it breaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap-Up
&lt;/h2&gt;

&lt;p&gt;The pattern across all ten:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apply ops discipline to AI sessions.&lt;/li&gt;
&lt;li&gt;Treat AI claims like vendor claims — verify them in your environment.&lt;/li&gt;
&lt;li&gt;Treat AI conversations like change windows — preamble, scope, verification, postmortem.&lt;/li&gt;
&lt;li&gt;Treat AI outputs like config diffs — explain them or reject them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I'm explicitly &lt;strong&gt;not&lt;/strong&gt; claiming:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;These are not unique to operations engineers. They generalize. They just happen to package tightly through the ops lens because the discipline is already there.&lt;/li&gt;
&lt;li&gt;These are not the only practices. Five is a lossy compression. The ten you'd build for your environment may differ in detail.&lt;/li&gt;
&lt;li&gt;These cover the &lt;strong&gt;build phase&lt;/strong&gt; of AI-assisted work — the session-time discipline. &lt;strong&gt;Day 2 operations&lt;/strong&gt; (monitoring AI-generated code in production, detecting silent drift, incident response when AI-assisted changes break) is its own discipline and deserves its own article. The patterns here are necessary but not sufficient for production AI usage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The big idea: AI doesn't replace engineering judgment — it amplifies it. Amplifying lazy judgment produces more bad code, faster. Amplifying disciplined judgment produces clear, audited, defensible work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;A future part of this series will cover &lt;strong&gt;how the design review for these articles actually happened&lt;/strong&gt; — a Multi-AI Adversarial Review (MAAR) loop where Claude and a second AI argued against each other under human routing. That's the meta-process behind both Part 1 and this one.&lt;/p&gt;

&lt;p&gt;If you came in via this article, &lt;a href="https://dev.to/_d3709cf9e80fc6babbff/beating-250000-mental-comparisons-a-cross-domain-engineers-entity-resolution-case-study-42n4"&gt;Part 1&lt;/a&gt; is the concrete build that produced these lessons.&lt;/p&gt;

&lt;p&gt;Comments welcome — particularly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The five practices or anti-patterns you'd add.&lt;/li&gt;
&lt;li&gt;Cross-domain engineering experiences (any technical background → another).&lt;/li&gt;
&lt;li&gt;Cases where ops discipline did &lt;em&gt;not&lt;/em&gt; transfer cleanly to AI work.&lt;/li&gt;
&lt;li&gt;Rollback strategies when an AI-assisted change corrupts your codebase or repo state.&lt;/li&gt;
&lt;li&gt;Day 2 operations practices for AI-generated code in production (monitoring, drift detection, incident response).&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>watercooler</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Beating 250,000 Mental Comparisons: A Cross-Domain Engineer's Entity Resolution Case Study</title>
      <dc:creator>J.S_Falcon</dc:creator>
      <pubDate>Wed, 29 Apr 2026 11:13:41 +0000</pubDate>
      <link>https://forem.com/_d3709cf9e80fc6babbff/beating-250000-mental-comparisons-a-cross-domain-engineers-entity-resolution-case-study-42n4</link>
      <guid>https://forem.com/_d3709cf9e80fc6babbff/beating-250000-mental-comparisons-a-cross-domain-engineers-entity-resolution-case-study-42n4</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Operations/Systems engineer recently moved to the software side via AI collaboration.&lt;/li&gt;
&lt;li&gt;Built a domain-specific entity resolution tool in a handful of evening sessions with Claude Code.&lt;/li&gt;
&lt;li&gt;Caught about 99.2% of human-detected reconciliation errors when replayed against 8 weeks of historical data.&lt;/li&gt;
&lt;li&gt;Turned a "skilled-veterans-only" weekly task into something anyone on the team can run.&lt;/li&gt;
&lt;li&gt;Design retrofitted unexpectedly well to dual process theory, Gestalt psychology, and anchoring-bias defense.&lt;/li&gt;
&lt;li&gt;Source business records never reached an LLM. Deterministic pipeline + human review only.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  1. The Hidden Problem: When 500 × 500 Becomes a Cognitive Wall
&lt;/h2&gt;

&lt;p&gt;Many companies maintain the same business entities across multiple systems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A retailer tracks SKUs in an internal master AND on Amazon / Rakuten / Shopify exports.&lt;/li&gt;
&lt;li&gt;A clinic carries patient records in both an EMR and an insurance billing system.&lt;/li&gt;
&lt;li&gt;A manufacturer holds internal inventory but also receives partner inventory feeds.&lt;/li&gt;
&lt;li&gt;An accounting team reconciles general ledger entries against bank statements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These pairs need periodic reconciliation. In the technical literature this is &lt;strong&gt;Entity Resolution&lt;/strong&gt; or &lt;strong&gt;Data Reconciliation&lt;/strong&gt; — a universal problem that nearly every mid-to-large business hits eventually.&lt;/p&gt;

&lt;p&gt;The case study here uses the &lt;strong&gt;retail SKU vs marketplace listing&lt;/strong&gt; framing. (The actual industry I work in is intentionally abstracted, but the structure transfers cleanly.) Two systems, ~500 rows each, weekly reconciliation. Skilled humans needed about 3 hours per week. Newcomers, half a day to a full day. Hidden detail: the small row count masks the real difficulty.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is 500 × 500 hard?
&lt;/h3&gt;

&lt;h4&gt;
  
  
  The 250,000 problem
&lt;/h4&gt;

&lt;p&gt;Manually reconciling 500 × 500 pairs forces a person to evaluate up to &lt;strong&gt;250,000 combinations&lt;/strong&gt; in their head. Not 1,000 — 250,000. Plus typo tolerance, format variation (full-width vs half-width, mixed scripts, abbreviations, punctuation), and partial matches. Each pairwise judgment is not O(1).&lt;/p&gt;

&lt;p&gt;Brute-forcing this is computationally similar to running a 1,000-node full-mesh ping check vs a flat 1,000-node liveness check. Order-of-magnitude different load.&lt;/p&gt;

&lt;h4&gt;
  
  
  Working memory overflow
&lt;/h4&gt;

&lt;p&gt;Miller's "magical number" puts our short-term memory at 7 ± 2 chunks (Miller, 1956). Hunting matches across 1,000 candidates with format drift continuously overflows working memory and pegs System 2 (slow thinking) for the entire session. The 3-hour exhaustion experienced by veterans isn't a complaint — it's a neurological inevitability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Short to do" doesn't equal "easy to do"&lt;/strong&gt; for cognitive labor.&lt;/p&gt;

&lt;h4&gt;
  
  
  Reproducibility decay
&lt;/h4&gt;

&lt;p&gt;A one-off reconciliation can be brute-forced. But when the task repeats weekly across 10+ weeks, judgment drift becomes unavoidable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Last week I matched 'A Co.' and 'A. Company' as the same entity. This week I treated them as different."&lt;/li&gt;
&lt;li&gt;"Last week I tolerated typo X. This week I rejected it."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This drift is what really breaks data quality long-term. It's the same structural failure mode as "config review standards differ by reviewer" in infrastructure operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The actual target
&lt;/h3&gt;

&lt;p&gt;So the real problem the tool solved was not "shorten 3 hours per week" but:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;250,000 judgments × 10 weeks of consistent reproducibility — a quality bar humans can't physically sustain — backed by a deterministic machine.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Plus removing the skill dependency. "Only one veteran can do this in 3 hours" is a single point of failure. After the tool: anyone could run it with consistent quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Background: Who I Am and What I Was Solving
&lt;/h2&gt;

&lt;p&gt;I'm an Operations/Systems engineer. Configuration, validation, runbook authoring, monitoring, troubleshooting — that side of the house. Software development was not my primary craft, though scripting was always part of the job.&lt;/p&gt;

&lt;p&gt;I'd recently moved into a new business domain (about 2 months in) and the tooling target system was something I'd only been touching for ~1 month. From the user side I'd seen the workflow longer, but not as a developer.&lt;/p&gt;

&lt;p&gt;Translation: design / validation / runbook discipline solid. Python and application development essentially unfamiliar.&lt;/p&gt;

&lt;p&gt;This article is &lt;strong&gt;not a "look what I shipped" piece&lt;/strong&gt;. It's a record of how operations-side disciplines transferred unchanged into AI-assisted software work in an unfamiliar domain.&lt;/p&gt;

&lt;h3&gt;
  
  
  Who this article is for
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Reader&lt;/th&gt;
&lt;th&gt;Useful sections&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Operations / SRE engineers exploring AI assistance&lt;/td&gt;
&lt;td&gt;Everything&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mid-career engineers moving across technical domains&lt;/td&gt;
&lt;td&gt;Background, Architecture, Cognitive Design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineers new to AI-assisted development&lt;/td&gt;
&lt;td&gt;Architecture, Cognitive Design, PII&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Managers thinking about AI for their teams&lt;/td&gt;
&lt;td&gt;Results and the cognitive-load argument&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  3. PII / Compliance Considerations
&lt;/h2&gt;

&lt;p&gt;A question that always comes up in comments on entity-resolution articles: &lt;strong&gt;where does the data go?&lt;/strong&gt; Worth answering up front.&lt;/p&gt;

&lt;p&gt;In this implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Source business records never reach any LLM.&lt;/strong&gt; Both input files (internal master + external system export) are read locally by a Python script.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Matching is fully deterministic.&lt;/strong&gt; Pandas, openpyxl, and &lt;code&gt;difflib.SequenceMatcher&lt;/code&gt; for similarity. No embedding API. No remote inference at runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The LLM's role is code-side, not data-side.&lt;/strong&gt; Claude Code helped write the matching logic, the validation scripts, the design review, and the documentation. None of the actual records were ever sent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For testing only&lt;/strong&gt;, masked synthetic data was used in prompts. Real names, amounts, and addresses were replaced with synthetic equivalents before any prompt left the local environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge cases stay with humans.&lt;/strong&gt; When the deterministic pipeline can't decide, it surfaces a flagged row for human review — not for LLM second opinion.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation is intentional. The matching task is well-suited to deterministic logic. LLMs would only add cost, latency, and compliance exposure for no quality gain.&lt;/p&gt;

&lt;p&gt;If your team has even a soft "no business data into external AI" policy, this pattern is fully compatible.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Architecture: Two-Stage Matching + Cognitive Gates
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.11&lt;/li&gt;
&lt;li&gt;pandas + openpyxl (Excel I/O, color-coded output)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;difflib.SequenceMatcher&lt;/code&gt; for fuzzy similarity&lt;/li&gt;
&lt;li&gt;Rule-based throughout. No machine learning.&lt;/li&gt;
&lt;li&gt;~1,100 lines, single script.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phases
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1: Match by exact stakeholder name (or alias group)
Phase 2: Cross-match by name similarity ≥ 0.6 (rescue typos)
Phase 3: Last-name-only + structural match (single-typo tolerance)
Phase 4: Duplicate-registration detection (same stakeholder + similarity ≥ 0.8)
Phase 5: Rescue rows with no stakeholder name (attribute match)
Phase 5.5: Attribute-mismatch pair rescue (identifier similarity ≥ 0.7, stage 2)
Phase 6: Row generation + color decision
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The score function (key gates)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row_b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Hard gate: region must match — kills cross-region false positives
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;region_a&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;region_b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Hard gate: numeric attribute must be close enough
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value_a&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;value_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Identifier gate: row_b's identifier must be embeddable in row_a's identifier
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;is_identifier_match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier_b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Sub-identifier gate: anchoring-bias defense
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sub_id&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;addr_a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Soft scoring (only after every hard gate passed)
&lt;/span&gt;    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;identifier_match_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value_fallback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why this shape?
&lt;/h3&gt;

&lt;p&gt;The retail SKU framing helps here. The same product on a marketplace might appear as &lt;code&gt;iPhone15&lt;/code&gt; in your master and &lt;code&gt;iPhone 15 Pro Max&lt;/code&gt; on the marketplace. Same item family, different surface form. Two key insights:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Hard gates first.&lt;/strong&gt; "Different region" or "value difference &amp;gt; N" are absolute disqualifiers. Run them before any expensive similarity computation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Soft scoring last.&lt;/strong&gt; Once hard gates pass, compute similarity — but cap below 0.6 as "uncertain, surface to human."&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Why not ML / Vector DB / embeddings?
&lt;/h3&gt;

&lt;p&gt;Deterministic rule-based was chosen on purpose. Auditability was the requirement. When a flagged row is wrong, the operations team has to be able to trace exactly which gate fired and why. A black-box similarity score of 0.81 with no explanation cannot be reviewed, cannot be unit-tested, and cannot be defended in a compliance audit.&lt;/p&gt;

&lt;p&gt;ML is a fine choice when you have labeled training data, training infrastructure, and a continuous evaluation pipeline. None of these applied here. The operating constraint was: "anyone on the team should be able to read the code and know why it decided what it decided." That constraint forces deterministic logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Abstracted structure
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain-specific term&lt;/th&gt;
&lt;th&gt;Abstract concept&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Item / SKU&lt;/td&gt;
&lt;td&gt;Entity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stakeholder (vendor / agent)&lt;/td&gt;
&lt;td&gt;Stakeholder attribute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Price / Amount&lt;/td&gt;
&lt;td&gt;Primary numeric attribute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Address / Location&lt;/td&gt;
&lt;td&gt;Identifier (multi-attribute)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Building / SKU name&lt;/td&gt;
&lt;td&gt;Auxiliary identifier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Detail number / barcode&lt;/td&gt;
&lt;td&gt;Sub-identifier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Format variation (kana/latin/case)&lt;/td&gt;
&lt;td&gt;Data quality issue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain judgment&lt;/td&gt;
&lt;td&gt;Tacit knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is a universal "match entities across two systems with format drift" problem. The pattern reappears in EC, healthcare, HR, accounting, manufacturing, publishing — anywhere two systems represent the same business object differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Cognitive-Science Design Principles (the Twist)
&lt;/h2&gt;

&lt;p&gt;I didn't design this thinking about cognitive science. I built it, it worked, and only afterwards in a structured Gemini conversation did the underlying principles surface. The retrofit fits unsettlingly well.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Dual process theory (Daniel Kahneman)
&lt;/h3&gt;

&lt;p&gt;The two phases map onto two thinking modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;System 1 (fast) = Phases 1–5.&lt;/strong&gt; Fuzzy "is this roughly the same thing?" — similarity scores, identifier matching, attribute closeness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System 2 (slow) = &lt;code&gt;determine_color()&lt;/code&gt;.&lt;/strong&gt; Strict checks for value mismatch, format inconsistency, identifier mixing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Color-coded human review gets the System 1 fuzzy pass plus the System 2 strictness annotation, which is exactly the input shape humans need to make a final call.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Gestalt psychology
&lt;/h3&gt;

&lt;p&gt;Humans recognize "wholes," not character sequences. &lt;code&gt;iPhone15&lt;/code&gt; and &lt;code&gt;iPhone 15 Pro Max&lt;/code&gt; feel like the same product family even though strict string equality fails. So:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_identifier_match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier_b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Recognize chunked identity even with mixed scripts and separators.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[A-Za-z0-9\s\-_]+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;addr_a&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Matching by chunks survives whitespace, separator, and script variation.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.3 Anchoring &amp;amp; confirmation bias defenses
&lt;/h3&gt;

&lt;p&gt;Hard gates exist to deny human-style intuitive shortcuts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Same price, must be the same item" — rejected by sub-identifier gate.&lt;/li&gt;
&lt;li&gt;"Same name, must be the same person" — rejected by region gate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The machine's job is to be coldly skeptical exactly where humans get over-confident.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.4 Reducing human cognitive load (Human-in-the-Loop)
&lt;/h3&gt;

&lt;p&gt;When a human is asked to confirm a flagged row, they don't get an opaque "match score 0.62". They get a one-line annotation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Same entity matched | [Value mismatch] diff ¥2,000,000 (5.4%)
(A: ¥34,900,000 / B: ¥36,900,000) · identifier format inconsistent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The human doesn't waste cycles re-deriving why the row was flagged. Cognitive load drops sharply.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.5 Don't automate the ghost
&lt;/h3&gt;

&lt;p&gt;This part borrows from &lt;em&gt;Ghost in the Shell&lt;/em&gt;. Some judgments depend on tacit business knowledge that can't be reduced to rules. Don't build heuristics that pretend to encode them. Surface the row as a &lt;strong&gt;caution signal&lt;/strong&gt; and let a human apply the tacit layer.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Tightening the logic isn't a path to recreating the ghost.&lt;br&gt;
It's a path to revealing where the ghost is needed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Mapping summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cognitive concept&lt;/th&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System 1 (fast)&lt;/td&gt;
&lt;td&gt;Phases 1–5 (fuzzy matching)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System 2 (slow)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;determine_color()&lt;/code&gt; strict checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Two-stage / dual-pass&lt;/td&gt;
&lt;td&gt;Stage 1 + Stage 2 (Phase 5.5)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gestalt grouping&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;similarity&lt;/code&gt; / &lt;code&gt;is_identifier_match&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anchoring defense&lt;/td&gt;
&lt;td&gt;Sub-identifier gate, identifier gate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cognitive load reduction&lt;/td&gt;
&lt;td&gt;Aggregated &lt;code&gt;[reason] diff X&lt;/code&gt; annotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human-in-the-Loop&lt;/td&gt;
&lt;td&gt;Caution signals for tacit-knowledge zones&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  6. Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Recall on 8 weeks of historical data
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Errors flagged by humans (excluding outlier weeks)&lt;/td&gt;
&lt;td&gt;~130&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Errors caught by the tool&lt;/td&gt;
&lt;td&gt;~129&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~99.2%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The single missed case was annotated by the human reviewer as "even a human couldn't decide here." Effectively the tool catches every case where a human commits a confident verdict.&lt;/p&gt;

&lt;p&gt;(Caveat: this is recall against 8 weeks of one team's data, not a benchmark claim. Different domains will need their own measurement.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Time and skill load
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Skilled veteran throughput&lt;/td&gt;
&lt;td&gt;~3 hrs/week&lt;/td&gt;
&lt;td&gt;~30 min/week (review only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcomer throughput&lt;/td&gt;
&lt;td&gt;half a day to full day&lt;/td&gt;
&lt;td&gt;~30 min/week&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skill dependency&lt;/td&gt;
&lt;td&gt;Yes (single point of failure)&lt;/td&gt;
&lt;td&gt;No (anyone can run it)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The time number understates the value. The real shift is &lt;strong&gt;breaking the skill SPOF&lt;/strong&gt;. Veteran out sick, leaves, or buried in another priority — work continues at the same quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  A note on false positives
&lt;/h3&gt;

&lt;p&gt;Recall is 99.2%, but the tool is intentionally tuned for higher recall over higher precision. False positives — pairs flagged for human review that turn out to be fine — are accepted as the trade-off. The ~30 min/week of human review handles them without strain.&lt;/p&gt;

&lt;p&gt;In a no-human-in-the-loop deployment this trade-off would be very different. Here, false positives are cheap (a glance from a human reviewer) and false negatives (missed reconciliation errors) are expensive (data drift propagates into business reports).&lt;/p&gt;

&lt;h2&gt;
  
  
  7. The Flowchart
&lt;/h2&gt;

&lt;p&gt;Drawing the judgment flow as diagrams surfaced things the code review didn't. Below are the four phases as separate figures, in execution order.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.1 Phase 1: Hard Gates (sequential disqualifiers)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2dyd67f8r7e2kwvxzcf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2dyd67f8r7e2kwvxzcf.png" alt="Phase 1: Hard Gates - region, value, auxiliary identifier, sub-identifier sequential disqualifiers" width="800" height="1143"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Region → numeric value → auxiliary identifier → sub-identifier. Each gate is an absolute disqualifier: any "No" drops the pair. The order matters — cheapest disqualifiers run first.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.2 Phase 2: Soft Match
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlft07vl32heaowckcsg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlft07vl32heaowckcsg.png" alt="Phase 2: Soft Match - compute_score threshold and lock" width="542" height="612"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once a pair clears all hard gates, &lt;code&gt;compute_score&lt;/code&gt; evaluates a soft similarity. Below 0.6 → drop. At or above → lock the pair as the same entity.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.3 Phase 3: Parallel Flag Checks
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ozelqo0xcqb643e4405.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ozelqo0xcqb643e4405.png" alt="Phase 3: Parallel flag checks - six independent anomaly tests aggregated into tags" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For confirmed matches, six independent checks fire in parallel. Each surfaces a "this matched, but here's a discrepancy" signal. Tags are aggregated; there is no early-return contamination between checks.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.4 Phase 4: Final Verdict and Drop Aggregation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6javxmbdxn53iioak5uw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6javxmbdxn53iioak5uw.png" alt="Phase 4: Final verdict color decision and drop aggregation into Unmatched lane" width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Aggregate the tags into a color verdict. Drops from Phase 1 and Phase 2 converge into the "Unmatched" lane, surfaced standalone in the human-review output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Things visible only after rendering as a diagram
&lt;/h3&gt;

&lt;p&gt;These were invisible while reading code, only obvious once drawn:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Phase 1 hard gates are ordered by computational cost.&lt;/strong&gt; Region → numeric → auxiliary → sub-identifier. I placed them by intuition; the diagram showed they were already optimal — cheapest disqualifiers first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 3 parallel flag checks are genuinely independent.&lt;/strong&gt; Six checks fire in parallel with no early-return contamination. The diagram confirmed there was no silent dependency between them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All &lt;code&gt;Drop1&lt;/code&gt;–&lt;code&gt;Drop5&lt;/code&gt; paths converge to the same &lt;code&gt;Unmatched&lt;/code&gt; node.&lt;/strong&gt; I was throwing away the drop reason. Re-running "why was this pair rejected?" was impossible. Fix: log the drop reason in the row annotation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Drawing the flowchart is roughly the same act as drawing an infrastructure topology before going live. The diagram is the rubber duck.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Wrap-up
&lt;/h2&gt;

&lt;p&gt;Three transferable lessons from this build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cognitive load is the hidden cost&lt;/strong&gt; of "short" repetitive judgment tasks. Headcount-hour math undersells the burnout reality and skill-SPOF risk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cognitive science principles fall out of good design retroactively.&lt;/strong&gt; I didn't design with them in mind; the principles became visible only through structured review (with a second AI). If your design retrofits to known principles, that's confirmation. If it doesn't, that's a smell.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLMs do NOT have to touch your data.&lt;/strong&gt; Most entity resolution work doesn't need them at all. Use them for code, design review, and documentation. Keep the business records local and deterministic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The implementation itself is internal-use only and won't be open-sourced. The patterns generalize cleanly to any two-system entity reconciliation: EC, healthcare, HR, accounting, manufacturing, publishing.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. What's Next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Coming in Part 2&lt;/strong&gt;: how this whole thing got built in the first place — the AI collaboration patterns, the anti-patterns I hit, and the cross-domain disciplines that transferred from operations to software development. (Link to A2 once published.)&lt;/p&gt;

&lt;p&gt;Comments on entity resolution, cognitive load in repetitive tasks, or cross-domain engineering experiences are welcome.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>architecture</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Field Notes from a Cross-Domain Engineer Working with AI</title>
      <dc:creator>J.S_Falcon</dc:creator>
      <pubDate>Wed, 29 Apr 2026 01:36:50 +0000</pubDate>
      <link>https://forem.com/_d3709cf9e80fc6babbff/field-notes-from-a-cross-domain-engineer-working-with-ai-13kh</link>
      <guid>https://forem.com/_d3709cf9e80fc6babbff/field-notes-from-a-cross-domain-engineer-working-with-ai-13kh</guid>
      <description>&lt;p&gt;I run two AI assistants from different vendors against each other on every non-trivial decision, with a human (me) sitting in the middle as the routing authority. What's surprised me most is not that they disagree — it's &lt;em&gt;how&lt;/em&gt; they disagree. They drift toward agreeing with me too quickly, then with each other too quickly, and I've had to design specific friction into the workflow to keep their disagreement productive.&lt;/p&gt;

&lt;p&gt;Writing this down because I think the friction patterns might be useful to other people running similar setups.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this is
&lt;/h2&gt;

&lt;p&gt;I've been working as an operations / systems engineer for about twenty years. Networks first, then servers, then config management, then incident response. The kind of role where the job is mostly &lt;em&gt;making sure nothing breaks&lt;/em&gt;, and where every change request comes with a runbook, a verification step, and a rollback plan.&lt;/p&gt;

&lt;p&gt;In the last year and a half, AI coding assistants pulled me into the software side of the house. Not because I wanted a career change — because the boundary between "writes code" and "runs systems" got thin enough that ignoring it stopped being an option.&lt;/p&gt;

&lt;p&gt;This series is &lt;strong&gt;a running log&lt;/strong&gt; of what I've been finding, written as I go. The disciplines I pulled in from operations turned out to map onto AI collaboration unexpectedly well. The places where they didn't, I've had to invent something for myself. Observation-heavy and prescription-light. I don't think I've discovered anything new; I've stumbled into the same territory a lot of practitioners and researchers are mapping right now, and I'm writing down my coordinates while they're fresh.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this is &lt;em&gt;not&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;A few things I want to flag up front.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is not a survey.&lt;/strong&gt; I haven't done an exhaustive review of the literature, the toolchain ecosystem, or the practitioner community. I'm certain there are people who've been running similar disciplines longer than me and writing them up better. If you find one, please send me the link — I'll learn from it, and I'd rather correct the record than defend a stale draft.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is dated material.&lt;/strong&gt; Anything you read here is a snapshot from early 2026, written from one specific vantage point: a Japanese operations engineer transitioning into AI-assisted software work, with paid Claude / ChatGPT / Gemini subscriptions and no team-scale deployment. If your context differs significantly, the disciplines may transfer poorly or not at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  The five disciplines this series circles around
&lt;/h2&gt;

&lt;p&gt;A set of operating principles I keep coming back to in my own daily work. Each earns its own article (or several) in the series. The short version, in a "simple to compose" → "complex to compose" order:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. TTL Satisficing
&lt;/h3&gt;

&lt;p&gt;I cap the number of back-and-forth rounds in any AI-assisted decision. Three is a good default. The point is not to find the perfect answer — it's to converge on a &lt;em&gt;good-enough&lt;/em&gt; answer before the next round costs more than the marginal improvement is worth. If three rounds don't converge, I treat that as a signal that the &lt;em&gt;problem&lt;/em&gt; is the problem, not the round count.&lt;/p&gt;

&lt;p&gt;This is essentially &lt;strong&gt;timeout / retry-budget design&lt;/strong&gt; from network operations, applied to a conversation with an AI.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Karpathy Rule (Simplicity First, Surgical Changes)
&lt;/h3&gt;

&lt;p&gt;Andrej Karpathy's framing, adapted to my workflow: I don't let the AI add what wasn't asked for, and I don't let it edit what wasn't in scope. Speculative additions are how technical debt grows at AI speed. Keeping the diff small and the proposal narrow is a discipline I have to actively enforce.&lt;/p&gt;

&lt;p&gt;This maps cleanly onto &lt;strong&gt;YAGNI ("You Aren't Gonna Need It")&lt;/strong&gt; and &lt;strong&gt;minimal-diff code review culture&lt;/strong&gt; — pre-AI principles, applied to a faster-moving collaborator.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Fail-Fast (Tolerance for Contradiction)
&lt;/h3&gt;

&lt;p&gt;When a problem doesn't converge across multiple attempts and multiple AIs, I want the AI to declare it unfeasible and hand me a graceful-degradation path. The AI's job is not to grind itself into the ground trying to solve everything. Its job is to &lt;strong&gt;return judgment to the human&lt;/strong&gt; when judgment is what's actually needed. "I can't solve this, here's how to amputate it cleanly" is a &lt;em&gt;better&lt;/em&gt; answer than "I'm trying again, please wait."&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;circuit-breaker behavior&lt;/strong&gt; from distributed systems, applied to AI reasoning loops.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Adversarial Review (the Devil's Advocate, made standing)
&lt;/h3&gt;

&lt;p&gt;For any non-trivial proposal, I force the AI to argue against itself. Standing instructions — in CLAUDE.md, in custom instructions, in the system prompt — so I don't have to remember to ask. An assistant that only agrees with me is a single point of failure, not a colleague.&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;red-team review&lt;/strong&gt; and &lt;strong&gt;RFC-style structured criticism&lt;/strong&gt;, made automatic instead of episodic.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. The Human Router (Multi-AI Adversarial Review)
&lt;/h3&gt;

&lt;p&gt;For decisions with real trade-offs, I run two AIs from different vendors against each other and stay in the loop as the routing authority. Not as supervisor, not as referee — as the &lt;strong&gt;operator&lt;/strong&gt; who decides which output goes where, and which conflicts get escalated. The cross-vendor part matters in my experience: same-vendor "multi-agent" setups still share a worldview. Cross-vendor (e.g., Claude vs. Gemini) actually surfaces disagreement.&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;human-in-the-loop control plane&lt;/strong&gt; thinking, where the data plane is "AIs argue" and the control plane is "human decides which arguments matter."&lt;/p&gt;

&lt;p&gt;I want to be clear: none of these are inventions of mine. The literature has Multi-Agent Debate, Self-Refine, Reflexion, Constitutional AI, and a growing body of human-in-the-loop research. I'm describing how I personally compose these ideas into a daily workflow, and writing down the specifics in case the composition is useful to someone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "Cross-Domain"
&lt;/h2&gt;

&lt;p&gt;I keep using this word and I should explain it.&lt;/p&gt;

&lt;p&gt;The AI engineering conversation I see is heavily centered on the &lt;em&gt;programmer's&lt;/em&gt; lens — code generation, dev tools, IDE integrations, prompt-to-PR workflows. That lens is real and valuable, and I'm not arguing against it.&lt;/p&gt;

&lt;p&gt;But I came in from operations, and operations has its own logic. Change management. Verification gates. Runbooks. Rollback procedures. Skepticism toward dashboards that are too green. The instinct to ask &lt;em&gt;"how would I know if this is silently broken?"&lt;/em&gt; before celebrating a green build.&lt;/p&gt;

&lt;p&gt;When you bring those instincts to AI collaboration, you get a different shape of practice than you'd get if you came in from the dev side. Not better. &lt;em&gt;Different&lt;/em&gt;. And I think the differences are worth writing down, because operations engineers, SREs, data engineers, business systems people, and infrastructure folks are all about to find themselves doing AI-assisted work — and the dev-centric playbook isn't going to fit them perfectly.&lt;/p&gt;

&lt;p&gt;That's what "cross-domain" is pointing at. Not crossing one specific bridge, but recognizing that there are many bridges, and that each engineering discipline brings its own toolkit to the AI collaboration table.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's coming in this series
&lt;/h2&gt;

&lt;p&gt;(Links will appear as articles are published. &lt;strong&gt;Each piece is designed to stand alone&lt;/strong&gt; — you don't need to have read this anchor page to follow any of them. This index is just here for the curious.)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Entity Resolution case study&lt;/strong&gt; — A practical build: deterministic two-system reconciliation with AI assistance, no business data leaving the local environment, ~99.2% recall on historical replay. Where I learned that operations discipline transfers cleanly to software work in unfamiliar domains.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Collaboration Patterns&lt;/strong&gt; — Five practices and five anti-patterns I picked up running AI-assisted operations work day to day. The kind of thing you only learn by getting burned.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alchemy Essay&lt;/strong&gt; — A late-night thought experiment mapping AI coding onto the principles of &lt;em&gt;Fullmetal Alchemist&lt;/em&gt;. The most polemical piece in the series; also the most personal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain-Native Governance&lt;/strong&gt; — The architectural argument: why dev-centric AI governance feels insufficient from where I sit, and what a domain-native alternative might look like. The most theoretical piece in the series.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain Logic First&lt;/strong&gt; — A position piece: each engineering discipline has its own logic, and there might be value in adapting AI to it rather than the other way around.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Field-Note Tips (ongoing)&lt;/strong&gt; — Short pieces on individual operating habits: the CLAUDE.md preamble template, three-tier tool routing, telling fast evaluation from real evaluation, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A note on tone
&lt;/h2&gt;

&lt;p&gt;The Japanese versions of these articles tend to land in a softer register than the English ones. That's a translation choice; the underlying observations are the same. If you read both, the English will feel more direct and the Japanese more reflective. Neither is the "real" version — both are. I'm a different writer in each language, and I've stopped trying to flatten that.&lt;/p&gt;

&lt;h2&gt;
  
  
  A note on the time-shift
&lt;/h2&gt;

&lt;p&gt;If you're reading this in 2027 or 2028 and any of these disciplines feel obvious by now, good. That means the field moved. The reason these notes exist is that &lt;em&gt;I&lt;/em&gt; was figuring this out in early 2026 and wanted a record. I'd rather have written them down too early and looked dated than written them down too late and lost the messy details.&lt;/p&gt;

&lt;p&gt;If you're reading this in 2026 and any of these disciplines feel new: I'm right there with you. We're figuring this out together.&lt;/p&gt;




&lt;p&gt;Comments welcome. Particularly from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Engineers from non-dev domains who've started AI-assisted work&lt;/li&gt;
&lt;li&gt;Researchers working on multi-agent debate / adversarial review / human-in-the-loop systems&lt;/li&gt;
&lt;li&gt;Anyone running multi-AI cross-vendor protocols and willing to share what's worked or broken&lt;/li&gt;
&lt;li&gt;Anyone who thinks one of these five disciplines is wrong and wants to argue&lt;/li&gt;
&lt;li&gt;Anyone who has prior art (papers, blog posts, internal write-ups) for any of this — I'd genuinely like to read them&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>architecture</category>
      <category>productivity</category>
    </item>
    <item>
      <title>"AI Coding Is Alchemy: A Late-Night Reflection from Fullmetal Alchemist"</title>
      <dc:creator>J.S_Falcon</dc:creator>
      <pubDate>Mon, 27 Apr 2026 17:47:25 +0000</pubDate>
      <link>https://forem.com/_d3709cf9e80fc6babbff/ai-coding-is-alchemy-a-late-night-reflection-from-fullmetal-alchemist-2epd</link>
      <guid>https://forem.com/_d3709cf9e80fc6babbff/ai-coding-is-alchemy-a-late-night-reflection-from-fullmetal-alchemist-2epd</guid>
      <description>&lt;p&gt;Late at night, writing code, I had a sudden realization: the experience of building with AI (LLMs) maps almost perfectly onto the foundational principle of alchemy from &lt;em&gt;Fullmetal Alchemist&lt;/em&gt; — &lt;strong&gt;Comprehension, Deconstruction, Reconstruction&lt;/strong&gt;. The current paradigm shift in AI development can be told as the evolution of these three steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Traditional Programming = Drawing Transmutation Circles by Hand
&lt;/h2&gt;

&lt;p&gt;Software engineering, until recently, was traditional alchemy.&lt;/p&gt;

&lt;p&gt;We read requirements and business logic (Comprehension), broke them down into algorithms and function designs (Deconstruction), and then drew the transmutation circle — the actual code, with exact syntax — by hand to bring the system to life (Reconstruction).&lt;/p&gt;

&lt;p&gt;If even one chalk line wavered (a syntax error, a typo), the transmutation failed and the error blew back at us. We had to draw the circle on the ground by hand, every single time.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. AI Coding = Hand-Clap Transmutation Through the Gate of Truth
&lt;/h2&gt;

&lt;p&gt;Then ChatGPT, Claude, o1, and the rest arrived. It feels — at least to me — like we have collectively opened the &lt;strong&gt;Gate of Truth&lt;/strong&gt; for infrastructure and coding.&lt;/p&gt;

&lt;p&gt;Once we do the Comprehension and Deconstruction in our heads (requirements gathering through prompts, architecture design), we can outsource the most tedious step — Reconstruction (writing the code, drawing the transmutation circle) — to the Gate (the AI).&lt;/p&gt;

&lt;p&gt;We clap our hands — well, hit Enter — and thousands of lines of boilerplate or a complex regex assemble in an instant, without ever drawing a circle.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Equivalent Exchange Trap (An Operations View)
&lt;/h2&gt;

&lt;p&gt;Hand-clap transmutation looks indistinguishable from magic. But here is the trap an operations engineer notices.&lt;/p&gt;

&lt;p&gt;Outsourcing Reconstruction to the AI is &lt;strong&gt;conditional on the human side getting Comprehension and Deconstruction completely right.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What happens when domain knowledge (Comprehension) is shallow, or the logical structure of the prompt (Deconstruction) is broken, and you still hand the transmutation off to the AI? You ship a Chimera into production — undebuggable spaghetti code, or a quietly exploitable security hole.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Equivalent Exchange — the foundational law of alchemy — does not get repealed by AI.&lt;/strong&gt; Whatever amount of human thought you skip, the system charges you back later, in the form of an incident.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. "All Is One, One Is All" — Living Next to a Collective Intelligence
&lt;/h2&gt;

&lt;p&gt;There is one more truth from the same series that has to be named: &lt;strong&gt;"All is One, One is All."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An LLM is the &lt;strong&gt;All&lt;/strong&gt; — every line of code engineers have ever written, mixed with the cumulative knowledge of humanity, distilled into one model. Each of us, the engineer at the keyboard, is only &lt;strong&gt;one&lt;/strong&gt; — a single point inside that enormous stream.&lt;/p&gt;

&lt;p&gt;But if we hand everything over to the All and merely ship whatever it emits, we collapse into the All. We become a part of it — a downstream API endpoint of the AI's output, indistinguishable from the noise.&lt;/p&gt;

&lt;p&gt;It is exactly because the &lt;strong&gt;One&lt;/strong&gt; — the individual engineer — Comprehends the whole system and Deconstructs it with intent, that the &lt;strong&gt;All&lt;/strong&gt; — the AI's collective intelligence — Reconstructs it as something with actual value.&lt;/p&gt;

&lt;p&gt;Quality, security, the integrity of the larger system: all of these come back, finally, to the thought and judgment of the One.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Don't Become a Homunculus — Keep a Heart of Steel
&lt;/h2&gt;

&lt;p&gt;As AI advances, the value of an engineer who is only a &lt;em&gt;biological pair of hands&lt;/em&gt; — Reconstruction-only labor — is collapsing.&lt;/p&gt;

&lt;p&gt;But the roles of Comprehending the system, Deconstructing its boundaries, and taking responsibility for what the AI emits — these will not be automated.&lt;/p&gt;

&lt;p&gt;The survival strategy is not to outrun the AI. It is to &lt;strong&gt;resist being swept away by its overwhelming speed&lt;/strong&gt;, to deliberately introduce the friction of thought — the small pain of slowing down — and to stay in that resistance.&lt;/p&gt;

&lt;p&gt;That, I think, is the &lt;strong&gt;Heart of Steel&lt;/strong&gt; that keeps us from sliding into Homunculi: dolls who have surrendered the act of thinking.&lt;/p&gt;

&lt;p&gt;We are alchemists who have seen Truth. There is no going back to the world where every circle was drawn by hand. But the one thing we cannot afford to let go of is the act of thinking for ourselves and owning what we ship.&lt;/p&gt;

&lt;p&gt;A quiet promise made to myself in front of a monitor, late at night.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The Japanese title of Fullmetal Alchemist is 鋼の錬金術師 (Hagane no Renkinjutsushi — "Steel Alchemist"). The "Heart of Steel" line above is a small wordplay on that title. It survives translation only partially.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>watercooler</category>
      <category>discuss</category>
      <category>programming</category>
    </item>
    <item>
      <title>"Beating 250,000 Mental Comparisons: A Cross-Domain Engineer's Entity Resolution Case Study"</title>
      <dc:creator>J.S_Falcon</dc:creator>
      <pubDate>Sun, 26 Apr 2026 08:41:34 +0000</pubDate>
      <link>https://forem.com/_d3709cf9e80fc6babbff/beating-250000-mental-comparisons-a-cross-domain-engineers-entity-resolution-case-study-3j1b</link>
      <guid>https://forem.com/_d3709cf9e80fc6babbff/beating-250000-mental-comparisons-a-cross-domain-engineers-entity-resolution-case-study-3j1b</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Operations/Systems engineer recently moved to the software side via AI collaboration.&lt;/li&gt;
&lt;li&gt;Built a domain-specific entity resolution tool in a handful of evening sessions with Claude Code.&lt;/li&gt;
&lt;li&gt;Caught about 99.2% of human-detected reconciliation errors when replayed against 8 weeks of historical data.&lt;/li&gt;
&lt;li&gt;Turned a "skilled-veterans-only" weekly task into something anyone on the team can run.&lt;/li&gt;
&lt;li&gt;Design retrofitted unexpectedly well to dual process theory, Gestalt psychology, and anchoring-bias defense.&lt;/li&gt;
&lt;li&gt;Source business records never reached an LLM. Deterministic pipeline + human review only.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  1. The Hidden Problem: When 500 × 500 Becomes a Cognitive Wall
&lt;/h2&gt;

&lt;p&gt;Many companies maintain the same business entities across multiple systems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A retailer tracks SKUs in an internal master AND on Amazon / Rakuten / Shopify exports.&lt;/li&gt;
&lt;li&gt;A clinic carries patient records in both an EMR and an insurance billing system.&lt;/li&gt;
&lt;li&gt;A manufacturer holds internal inventory but also receives partner inventory feeds.&lt;/li&gt;
&lt;li&gt;An accounting team reconciles general ledger entries against bank statements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These pairs need periodic reconciliation. In the technical literature this is &lt;strong&gt;Entity Resolution&lt;/strong&gt; or &lt;strong&gt;Data Reconciliation&lt;/strong&gt; — a universal problem that nearly every mid-to-large business hits eventually.&lt;/p&gt;

&lt;p&gt;The case study here uses the &lt;strong&gt;retail SKU vs marketplace listing&lt;/strong&gt; framing. (The actual industry I work in is intentionally abstracted, but the structure transfers cleanly.) Two systems, ~500 rows each, weekly reconciliation. Skilled humans needed about 3 hours per week. Newcomers, half a day to a full day. Hidden detail: the small row count masks the real difficulty.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is 500 × 500 hard?
&lt;/h3&gt;

&lt;h4&gt;
  
  
  The 250,000 problem
&lt;/h4&gt;

&lt;p&gt;Manually reconciling 500 × 500 pairs forces a person to evaluate up to &lt;strong&gt;250,000 combinations&lt;/strong&gt; in their head. Not 1,000 — 250,000. Plus typo tolerance, format variation (full-width vs half-width, mixed scripts, abbreviations, punctuation), and partial matches. Each pairwise judgment is not O(1).&lt;/p&gt;

&lt;p&gt;Brute-forcing this is computationally similar to running a 1,000-node full-mesh ping check vs a flat 1,000-node liveness check. Order-of-magnitude different load.&lt;/p&gt;

&lt;h4&gt;
  
  
  Working memory overflow
&lt;/h4&gt;

&lt;p&gt;Miller's "magical number" puts our short-term memory at 7 ± 2 chunks (Miller, 1956). Hunting matches across 1,000 candidates with format drift continuously overflows working memory and pegs System 2 (slow thinking) for the entire session. The 3-hour exhaustion experienced by veterans isn't a complaint — it's a neurological inevitability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Short to do" doesn't equal "easy to do"&lt;/strong&gt; for cognitive labor.&lt;/p&gt;

&lt;h4&gt;
  
  
  Reproducibility decay
&lt;/h4&gt;

&lt;p&gt;A one-off reconciliation can be brute-forced. But when the task repeats weekly across 10+ weeks, judgment drift becomes unavoidable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Last week I matched 'A Co.' and 'A. Company' as the same entity. This week I treated them as different."&lt;/li&gt;
&lt;li&gt;"Last week I tolerated typo X. This week I rejected it."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This drift is what really breaks data quality long-term. It's the same structural failure mode as "config review standards differ by reviewer" in infrastructure operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The actual target
&lt;/h3&gt;

&lt;p&gt;So the real problem the tool solved was not "shorten 3 hours per week" but:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;250,000 judgments × 10 weeks of consistent reproducibility — a quality bar humans can't physically sustain — backed by a deterministic machine.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Plus removing the skill dependency. "Only one veteran can do this in 3 hours" is a single point of failure. After the tool: anyone could run it with consistent quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Background: Who I Am and What I Was Solving
&lt;/h2&gt;

&lt;p&gt;I'm an Operations/Systems engineer. Configuration, validation, runbook authoring, monitoring, troubleshooting — that side of the house. Software development was not my primary craft, though scripting was always part of the job.&lt;/p&gt;

&lt;p&gt;I'd recently moved into a new business domain (about 2 months in) and the tooling target system was something I'd only been touching for ~1 month. From the user side I'd seen the workflow longer, but not as a developer.&lt;/p&gt;

&lt;p&gt;Translation: design / validation / runbook discipline solid. Python and application development essentially unfamiliar.&lt;/p&gt;

&lt;p&gt;This article is &lt;strong&gt;not a "look what I shipped" piece&lt;/strong&gt;. It's a record of how operations-side disciplines transferred unchanged into AI-assisted software work in an unfamiliar domain.&lt;/p&gt;

&lt;h3&gt;
  
  
  Who this article is for
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Reader&lt;/th&gt;
&lt;th&gt;Useful sections&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Operations / SRE engineers exploring AI assistance&lt;/td&gt;
&lt;td&gt;Everything&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mid-career engineers moving across technical domains&lt;/td&gt;
&lt;td&gt;Background, Architecture, Cognitive Design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineers new to AI-assisted development&lt;/td&gt;
&lt;td&gt;Architecture, Cognitive Design, PII&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Managers thinking about AI for their teams&lt;/td&gt;
&lt;td&gt;Results and the cognitive-load argument&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  3. PII / Compliance Considerations
&lt;/h2&gt;

&lt;p&gt;A question that always comes up in comments on entity-resolution articles: &lt;strong&gt;where does the data go?&lt;/strong&gt; Worth answering up front.&lt;/p&gt;

&lt;p&gt;In this implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Source business records never reach any LLM.&lt;/strong&gt; Both input files (internal master + external system export) are read locally by a Python script.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Matching is fully deterministic.&lt;/strong&gt; Pandas, openpyxl, and &lt;code&gt;difflib.SequenceMatcher&lt;/code&gt; for similarity. No embedding API. No remote inference at runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The LLM's role is code-side, not data-side.&lt;/strong&gt; Claude Code helped write the matching logic, the validation scripts, the design review, and the documentation. None of the actual records were ever sent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For testing only&lt;/strong&gt;, masked synthetic data was used in prompts. Real names, amounts, and addresses were replaced with synthetic equivalents before any prompt left the local environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge cases stay with humans.&lt;/strong&gt; When the deterministic pipeline can't decide, it surfaces a flagged row for human review — not for LLM second opinion.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation is intentional. The matching task is well-suited to deterministic logic. LLMs would only add cost, latency, and compliance exposure for no quality gain.&lt;/p&gt;

&lt;p&gt;If your team has even a soft "no business data into external AI" policy, this pattern is fully compatible.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Architecture: Two-Stage Matching + Cognitive Gates
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.11&lt;/li&gt;
&lt;li&gt;pandas + openpyxl (Excel I/O, color-coded output)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;difflib.SequenceMatcher&lt;/code&gt; for fuzzy similarity&lt;/li&gt;
&lt;li&gt;Rule-based throughout. No machine learning.&lt;/li&gt;
&lt;li&gt;~1,100 lines, single script.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phases
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1: Match by exact stakeholder name (or alias group)
Phase 2: Cross-match by name similarity ≥ 0.6 (rescue typos)
Phase 3: Last-name-only + structural match (single-typo tolerance)
Phase 4: Duplicate-registration detection (same stakeholder + similarity ≥ 0.8)
Phase 5: Rescue rows with no stakeholder name (attribute match)
Phase 5.5: Attribute-mismatch pair rescue (identifier similarity ≥ 0.7, stage 2)
Phase 6: Row generation + color decision
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The score function (key gates)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row_b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Hard gate: region must match — kills cross-region false positives
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;region_a&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;region_b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Hard gate: numeric attribute must be close enough
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value_a&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;value_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Identifier gate: row_b's identifier must be embeddable in row_a's identifier
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;is_identifier_match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier_b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Sub-identifier gate: anchoring-bias defense
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sub_id&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;addr_a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="c1"&gt;# Soft scoring (only after every hard gate passed)
&lt;/span&gt;    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;identifier_match_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value_fallback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why this shape?
&lt;/h3&gt;

&lt;p&gt;The retail SKU framing helps here. The same product on a marketplace might appear as &lt;code&gt;iPhone15&lt;/code&gt; in your master and &lt;code&gt;iPhone 15 Pro Max&lt;/code&gt; on the marketplace. Same item family, different surface form. Two key insights:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Hard gates first.&lt;/strong&gt; "Different region" or "value difference &amp;gt; N" are absolute disqualifiers. Run them before any expensive similarity computation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Soft scoring last.&lt;/strong&gt; Once hard gates pass, compute similarity — but cap below 0.6 as "uncertain, surface to human."&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Why not ML / Vector DB / embeddings?
&lt;/h3&gt;

&lt;p&gt;Deterministic rule-based was chosen on purpose. Auditability was the requirement. When a flagged row is wrong, the operations team has to be able to trace exactly which gate fired and why. A black-box similarity score of 0.81 with no explanation cannot be reviewed, cannot be unit-tested, and cannot be defended in a compliance audit.&lt;/p&gt;

&lt;p&gt;ML is a fine choice when you have labeled training data, training infrastructure, and a continuous evaluation pipeline. None of these applied here. The operating constraint was: "anyone on the team should be able to read the code and know why it decided what it decided." That constraint forces deterministic logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Abstracted structure
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain-specific term&lt;/th&gt;
&lt;th&gt;Abstract concept&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Item / SKU&lt;/td&gt;
&lt;td&gt;Entity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stakeholder (vendor / agent)&lt;/td&gt;
&lt;td&gt;Stakeholder attribute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Price / Amount&lt;/td&gt;
&lt;td&gt;Primary numeric attribute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Address / Location&lt;/td&gt;
&lt;td&gt;Identifier (multi-attribute)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Building / SKU name&lt;/td&gt;
&lt;td&gt;Auxiliary identifier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Detail number / barcode&lt;/td&gt;
&lt;td&gt;Sub-identifier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Format variation (kana/latin/case)&lt;/td&gt;
&lt;td&gt;Data quality issue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain judgment&lt;/td&gt;
&lt;td&gt;Tacit knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is a universal "match entities across two systems with format drift" problem. The pattern reappears in EC, healthcare, HR, accounting, manufacturing, publishing — anywhere two systems represent the same business object differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Cognitive-Science Design Principles (the Twist)
&lt;/h2&gt;

&lt;p&gt;I didn't design this thinking about cognitive science. I built it, it worked, and only afterwards in a structured Gemini conversation did the underlying principles surface. The retrofit fits unsettlingly well.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Dual process theory (Daniel Kahneman)
&lt;/h3&gt;

&lt;p&gt;The two phases map onto two thinking modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;System 1 (fast) = Phases 1–5.&lt;/strong&gt; Fuzzy "is this roughly the same thing?" — similarity scores, identifier matching, attribute closeness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System 2 (slow) = &lt;code&gt;determine_color()&lt;/code&gt;.&lt;/strong&gt; Strict checks for value mismatch, format inconsistency, identifier mixing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Color-coded human review gets the System 1 fuzzy pass plus the System 2 strictness annotation, which is exactly the input shape humans need to make a final call.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Gestalt psychology
&lt;/h3&gt;

&lt;p&gt;Humans recognize "wholes," not character sequences. &lt;code&gt;iPhone15&lt;/code&gt; and &lt;code&gt;iPhone 15 Pro Max&lt;/code&gt; feel like the same product family even though strict string equality fails. So:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_identifier_match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier_b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Recognize chunked identity even with mixed scripts and separators.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[A-Za-z0-9\s\-_]+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;addr_a&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Matching by chunks survives whitespace, separator, and script variation.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.3 Anchoring &amp;amp; confirmation bias defenses
&lt;/h3&gt;

&lt;p&gt;Hard gates exist to deny human-style intuitive shortcuts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Same price, must be the same item" — rejected by sub-identifier gate.&lt;/li&gt;
&lt;li&gt;"Same name, must be the same person" — rejected by region gate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The machine's job is to be coldly skeptical exactly where humans get over-confident.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.4 Reducing human cognitive load (Human-in-the-Loop)
&lt;/h3&gt;

&lt;p&gt;When a human is asked to confirm a flagged row, they don't get an opaque "match score 0.62". They get a one-line annotation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Same entity matched | [Value mismatch] diff ¥2,000,000 (5.4%)
(A: ¥34,900,000 / B: ¥36,900,000) · identifier format inconsistent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The human doesn't waste cycles re-deriving why the row was flagged. Cognitive load drops sharply.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.5 Don't automate the ghost
&lt;/h3&gt;

&lt;p&gt;This part borrows from &lt;em&gt;Ghost in the Shell&lt;/em&gt;. Some judgments depend on tacit business knowledge that can't be reduced to rules. Don't build heuristics that pretend to encode them. Surface the row as a &lt;strong&gt;caution signal&lt;/strong&gt; and let a human apply the tacit layer.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Tightening the logic isn't a path to recreating the ghost.&lt;br&gt;
It's a path to revealing where the ghost is needed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Mapping summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cognitive concept&lt;/th&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System 1 (fast)&lt;/td&gt;
&lt;td&gt;Phases 1–5 (fuzzy matching)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System 2 (slow)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;determine_color()&lt;/code&gt; strict checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Two-stage / dual-pass&lt;/td&gt;
&lt;td&gt;Stage 1 + Stage 2 (Phase 5.5)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gestalt grouping&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;similarity&lt;/code&gt; / &lt;code&gt;is_identifier_match&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anchoring defense&lt;/td&gt;
&lt;td&gt;Sub-identifier gate, identifier gate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cognitive load reduction&lt;/td&gt;
&lt;td&gt;Aggregated &lt;code&gt;[reason] diff X&lt;/code&gt; annotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human-in-the-Loop&lt;/td&gt;
&lt;td&gt;Caution signals for tacit-knowledge zones&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  6. Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Recall on 8 weeks of historical data
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Errors flagged by humans (excluding outlier weeks)&lt;/td&gt;
&lt;td&gt;~130&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Errors caught by the tool&lt;/td&gt;
&lt;td&gt;~129&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~99.2%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The single missed case was annotated by the human reviewer as "even a human couldn't decide here." Effectively the tool catches every case where a human commits a confident verdict.&lt;/p&gt;

&lt;p&gt;(Caveat: this is recall against 8 weeks of one team's data, not a benchmark claim. Different domains will need their own measurement.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Time and skill load
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Skilled veteran throughput&lt;/td&gt;
&lt;td&gt;~3 hrs/week&lt;/td&gt;
&lt;td&gt;~30 min/week (review only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Newcomer throughput&lt;/td&gt;
&lt;td&gt;half a day to full day&lt;/td&gt;
&lt;td&gt;~30 min/week&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skill dependency&lt;/td&gt;
&lt;td&gt;Yes (single point of failure)&lt;/td&gt;
&lt;td&gt;No (anyone can run it)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The time number understates the value. The real shift is &lt;strong&gt;breaking the skill SPOF&lt;/strong&gt;. Veteran out sick, leaves, or buried in another priority — work continues at the same quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  A note on false positives
&lt;/h3&gt;

&lt;p&gt;Recall is 99.2%, but the tool is intentionally tuned for higher recall over higher precision. False positives — pairs flagged for human review that turn out to be fine — are accepted as the trade-off. The ~30 min/week of human review handles them without strain.&lt;/p&gt;

&lt;p&gt;In a no-human-in-the-loop deployment this trade-off would be very different. Here, false positives are cheap (a glance from a human reviewer) and false negatives (missed reconciliation errors) are expensive (data drift propagates into business reports).&lt;/p&gt;

&lt;h2&gt;
  
  
  7. The Flowchart
&lt;/h2&gt;

&lt;p&gt;Drawing the judgment flow as diagrams surfaced things the code review didn't. Below are the four phases as separate figures, in execution order.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.1 Phase 1: Hard Gates (sequential disqualifiers)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2dyd67f8r7e2kwvxzcf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2dyd67f8r7e2kwvxzcf.png" alt=" " width="800" height="1143"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Region → numeric value → auxiliary identifier → sub-identifier. Each gate is an absolute disqualifier: any "No" drops the pair. The order matters — cheapest disqualifiers run first.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.2 Phase 2: Soft Match
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlft07vl32heaowckcsg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlft07vl32heaowckcsg.png" alt=" " width="542" height="612"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once a pair clears all hard gates, &lt;code&gt;compute_score&lt;/code&gt; evaluates a soft similarity. Below 0.6 → drop. At or above → lock the pair as the same entity.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.3 Phase 3: Parallel Flag Checks
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ozelqo0xcqb643e4405.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ozelqo0xcqb643e4405.png" alt=" " width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For confirmed matches, six independent checks fire in parallel. Each surfaces a "this matched, but here's a discrepancy" signal. Tags are aggregated; there is no early-return contamination between checks.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.4 Phase 4: Final Verdict and Drop Aggregation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6javxmbdxn53iioak5uw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6javxmbdxn53iioak5uw.png" alt=" " width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Aggregate the tags into a color verdict. Drops from Phase 1 and Phase 2 converge into the "Unmatched" lane, surfaced standalone in the human-review output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Things visible only after rendering as a diagram
&lt;/h3&gt;

&lt;p&gt;These were invisible while reading code, only obvious once drawn:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Phase 1 hard gates are ordered by computational cost.&lt;/strong&gt; Region → numeric → auxiliary → sub-identifier. I placed them by intuition; the diagram showed they were already optimal — cheapest disqualifiers first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 3 parallel flag checks are genuinely independent.&lt;/strong&gt; Six checks fire in parallel with no early-return contamination. The diagram confirmed there was no silent dependency between them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All &lt;code&gt;Drop1&lt;/code&gt;–&lt;code&gt;Drop5&lt;/code&gt; paths converge to the same &lt;code&gt;Unmatched&lt;/code&gt; node.&lt;/strong&gt; I was throwing away the drop reason. Re-running "why was this pair rejected?" was impossible. Fix: log the drop reason in the row annotation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Drawing the flowchart is roughly the same act as drawing an infrastructure topology before going live. The diagram is the rubber duck.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Wrap-up
&lt;/h2&gt;

&lt;p&gt;Three transferable lessons from this build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cognitive load is the hidden cost&lt;/strong&gt; of "short" repetitive judgment tasks. Headcount-hour math undersells the burnout reality and skill-SPOF risk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cognitive science principles fall out of good design retroactively.&lt;/strong&gt; I didn't design with them in mind; the principles became visible only through structured review (with a second AI). If your design retrofits to known principles, that's confirmation. If it doesn't, that's a smell.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLMs do NOT have to touch your data.&lt;/strong&gt; Most entity resolution work doesn't need them at all. Use them for code, design review, and documentation. Keep the business records local and deterministic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The implementation itself is internal-use only and won't be open-sourced. The patterns generalize cleanly to any two-system entity reconciliation: EC, healthcare, HR, accounting, manufacturing, publishing.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. What's Next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Coming in Part 2&lt;/strong&gt;: how this whole thing got built in the first place — the AI collaboration patterns, the anti-patterns I hit, and the cross-domain disciplines that transferred from operations to software development. (Link to A2 once published.)&lt;/p&gt;

&lt;p&gt;Comments on entity resolution, cognitive load in repetitive tasks, or cross-domain engineering experiences are welcome.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>architecture</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
