<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Steve Harlow</title>
    <description>The latest articles on Forem by Steve Harlow (@steve_harlow_0dbc0e910b6d).</description>
    <link>https://forem.com/steve_harlow_0dbc0e910b6d</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3648455%2F134510f9-fcbe-4f4a-aa40-6d640fafb234.jpg</url>
      <title>Forem: Steve Harlow</title>
      <link>https://forem.com/steve_harlow_0dbc0e910b6d</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/steve_harlow_0dbc0e910b6d"/>
    <language>en</language>
    <item>
      <title>Building a Transparent AI Pipeline: 59 Weeks of Automated Political Scoring with Claude API</title>
      <dc:creator>Steve Harlow</dc:creator>
      <pubDate>Thu, 19 Mar 2026 03:25:45 +0000</pubDate>
      <link>https://forem.com/steve_harlow_0dbc0e910b6d/building-a-transparent-ai-pipeline-59-weeks-of-automated-political-scoring-with-claude-api-33of</link>
      <guid>https://forem.com/steve_harlow_0dbc0e910b6d/building-a-transparent-ai-pipeline-59-weeks-of-automated-political-scoring-with-claude-api-33of</guid>
      <description>&lt;p&gt;I've been running an automated AI pipeline for over a year that ingests news articles, clusters them into political events, and scores each event on two independent axes. Here's how it works, what I learned, and why I made everything transparent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Political events have two dimensions that are rarely measured together:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;How much institutional damage&lt;/strong&gt; does this cause? (democratic health)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How much media attention&lt;/strong&gt; does it get? (distraction economics)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When these are wildly mismatched — high damage, low attention — something important is being missed. I built &lt;a href="https://distractionindex.org" rel="noopener noreferrer"&gt;The Distraction Index&lt;/a&gt; to detect these gaps automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;News Sources (GDELT + GNews + Google News RSS)
    ↓ every 4 hours
Ingestion Pipeline (/api/ingest)
    ↓ dedup + store
Clustering (Claude Haiku) → group articles into events
    ↓
Dual-Axis Scoring (Claude Sonnet) → Score A + Score B
    ↓
Weekly Freeze → immutable snapshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tech stack:&lt;/strong&gt; Next.js 16 (App Router), Supabase (PostgreSQL), Claude API, Vercel&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Two Models?
&lt;/h3&gt;

&lt;p&gt;Cost optimization was critical. Running everything through Sonnet would cost ~$300/month. Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Haiku&lt;/strong&gt; handles article clustering (~$0.25/1M tokens) — it groups articles by topic similarity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet&lt;/strong&gt; handles scoring (~$3/1M tokens) — it evaluates institutional impact using structured prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: &lt;strong&gt;~$30/month&lt;/strong&gt; for a production pipeline processing articles every 4 hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dual Scoring System
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Score A: Constitutional Damage (0-100)
&lt;/h3&gt;

&lt;p&gt;Seven weighted governance drivers, each scored 0-5:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Driver&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;th&gt;What it measures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Judicial Independence&lt;/td&gt;
&lt;td&gt;0.18&lt;/td&gt;
&lt;td&gt;Court stacking, ruling defiance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Press Freedom&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;Journalist targeting, access restrictions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voting Rights&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;Disenfranchisement, election interference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Environmental Policy&lt;/td&gt;
&lt;td&gt;0.12&lt;/td&gt;
&lt;td&gt;Regulatory rollbacks, enforcement gaps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Civil Liberties&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;Due process, privacy, free assembly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;International Norms&lt;/td&gt;
&lt;td&gt;0.10&lt;/td&gt;
&lt;td&gt;Treaty violations, alliance damage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fiscal Governance&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;Budget manipulation, oversight bypass&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Multiplied by severity modifiers (durability × reversibility × precedent) and mechanism/scope modifiers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Score B: Distraction/Hype (0-100)
&lt;/h3&gt;

&lt;p&gt;Two-layer model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layer 1 (55%):&lt;/strong&gt; Raw media hype — volume, social amplification, cross-platform spread, emotional framing, celebrity involvement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 2 (45%):&lt;/strong&gt; Strategic manipulation indicators — timing relative to damage events, coordinated messaging, deflection patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Layer 2 is modulated by an intentionality score (0-15). Low intentionality → Layer 2 weight drops to 10%.&lt;/p&gt;

&lt;h3&gt;
  
  
  Classification
&lt;/h3&gt;

&lt;p&gt;Events are classified by dominance margin:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Damage&lt;/strong&gt; (List A): Score A exceeds Score B by ≥10 points&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distraction&lt;/strong&gt; (List B): Score B exceeds Score A by ≥10 points&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Noise&lt;/strong&gt; (List C): Neither dominates&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Smokescreen Index
&lt;/h2&gt;

&lt;p&gt;The most interesting feature: automatic pairing of high-distraction events with concurrent high-damage events.&lt;/p&gt;

&lt;p&gt;When a B-dominant event (media spectacle) co-occurs with an A-dominant event (institutional harm) that received less coverage, the system flags it as a potential smokescreen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;210+ pairs identified&lt;/strong&gt; across 59 weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Radical Transparency
&lt;/h2&gt;

&lt;p&gt;Every scoring formula, weight, and AI prompt is published at &lt;a href="https://distractionindex.org/methodology" rel="noopener noreferrer"&gt;/methodology&lt;/a&gt;. This was a deliberate design choice — if you're scoring political events, your methodology must be auditable.&lt;/p&gt;

&lt;p&gt;Key transparency features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Immutable weekly snapshots&lt;/strong&gt; — once a week freezes, scores cannot be silently changed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Append-only corrections&lt;/strong&gt; — post-freeze corrections are timestamped and linked to the original&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Published prompts&lt;/strong&gt; — the exact Claude prompts used for scoring are documented&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open source&lt;/strong&gt; — &lt;a href="https://github.com/sgharlow/distraction" rel="noopener noreferrer"&gt;full codebase on GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Publishing your prompts is terrifying
&lt;/h3&gt;

&lt;p&gt;When your prompt templates are public, anyone can argue with your framing. That's the point — but it requires thick skin and a willingness to iterate.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Immutability prevents model drift
&lt;/h3&gt;

&lt;p&gt;Without frozen snapshots, you can't tell if score changes come from real-world changes or model updates. Immutability is essential for longitudinal analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The two-axis approach reveals patterns
&lt;/h3&gt;

&lt;p&gt;Single-dimension scoring (left/right, reliable/unreliable) misses the key insight: damage and distraction are independent variables. Some events are both. Some are neither.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Cost optimization matters for indie projects
&lt;/h3&gt;

&lt;p&gt;The Haiku-for-clustering, Sonnet-for-scoring split keeps costs at ~$30/month. Without this, the project wouldn't be sustainable as a solo effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;After 59 weeks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1,500+&lt;/strong&gt; scored events&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;11,800+&lt;/strong&gt; ingested articles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;210+&lt;/strong&gt; smokescreen pairs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;288&lt;/strong&gt; tests passing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1,071&lt;/strong&gt; pages indexed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live site:&lt;/strong&gt; &lt;a href="https://distractionindex.org" rel="noopener noreferrer"&gt;distractionindex.org&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Methodology:&lt;/strong&gt; &lt;a href="https://distractionindex.org/methodology" rel="noopener noreferrer"&gt;distractionindex.org/methodology&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source code:&lt;/strong&gt; &lt;a href="https://github.com/sgharlow/distraction" rel="noopener noreferrer"&gt;github.com/sgharlow/distraction&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love feedback on the scoring methodology. What would you weight differently? What blind spots do you see?&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>civictech</category>
      <category>ai</category>
      <category>nextjs</category>
    </item>
    <item>
      <title>Building AccessBrowse: Voice-Driven Web Browsing with Gemini AI</title>
      <dc:creator>Steve Harlow</dc:creator>
      <pubDate>Thu, 12 Mar 2026 05:13:20 +0000</pubDate>
      <link>https://forem.com/steve_harlow_0dbc0e910b6d/building-accessbrowse-voice-driven-web-browsing-with-gemini-ai-5cj4</link>
      <guid>https://forem.com/steve_harlow_0dbc0e910b6d/building-accessbrowse-voice-driven-web-browsing-with-gemini-ai-5cj4</guid>
      <description>&lt;h1&gt;
  
  
  Building AccessBrowse: Voice-Driven Web Browsing with Gemini AI
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;This blog post was created for the Gemini Live Agent Challenge hackathon.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;#GeminiLiveAgentChallenge&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: The Web Is Built for Eyes
&lt;/h2&gt;

&lt;p&gt;Imagine trying to find an apartment on Zillow without seeing the screen. A sighted user glances at a map, scans listings, clicks filters — all in seconds. A visually impaired user faces a different reality: tab through dozens of invisible elements, hope the screen reader can parse Zillow's React-rendered layout, and guess which button says "Apply Filters" versus "Clear All." Over 2.2 billion people worldwide live with visual impairment, and for many of them, this is what browsing the web feels like every day.&lt;/p&gt;

&lt;p&gt;Despite decades of work on web accessibility standards, screen readers still fundamentally struggle with the modern web. Dynamic JavaScript layouts, single-page applications that never trigger a page reload, and interactive elements that rely on visual context rather than semantic markup create constant barriers. The root issue is architectural: most assistive tools try to understand websites through their source code — parsing the DOM, following ARIA roles, reading element text. This works for well-structured, semantic HTML. It fails on the messy, JavaScript-heavy reality of how websites are actually built.&lt;/p&gt;

&lt;p&gt;We wanted to try a different approach: what if we looked at websites the same way a sighted person does — visually — and combined that with natural voice conversation?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Approach: Coordinate-Based Browsing
&lt;/h2&gt;

&lt;p&gt;The core insight behind AccessBrowse is simple: instead of parsing the DOM, take a screenshot and ask an AI model what it sees. This is exactly what Gemini Computer Use is designed for.&lt;/p&gt;

&lt;p&gt;Here is the action loop at the heart of AccessBrowse:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user speaks a request ("Find me apartments in Seattle under $1000 on Zillow")&lt;/li&gt;
&lt;li&gt;Gemini Live API receives the voice input and decides to call the &lt;code&gt;browse_web&lt;/code&gt; tool&lt;/li&gt;
&lt;li&gt;The backend requests a screenshot from the Chrome extension&lt;/li&gt;
&lt;li&gt;The screenshot is sent to Gemini Computer Use (&lt;code&gt;gemini-2.5-computer-use&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;The model analyzes the visual content and returns the next action with precise coordinates&lt;/li&gt;
&lt;li&gt;The content script translates coordinates to viewport pixels and executes the action&lt;/li&gt;
&lt;li&gt;Steps 3-6 repeat until the task is complete&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The coordinates come back on a &lt;strong&gt;normalized 1000x1000 grid&lt;/strong&gt; — (0, 0) is the top-left corner, (1000, 1000) is the bottom-right. This abstraction is powerful because it works regardless of the actual viewport size. Whether the browser window is 1280 pixels wide or 1920 pixels wide, coordinate (500, 300) always refers to the same relative position on the page.&lt;/p&gt;

&lt;p&gt;On the extension side, translating these coordinates to actual DOM interactions is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;coordinate&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerWidth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;coordinate&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerHeight&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;elementFromPoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;y&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This &lt;code&gt;document.elementFromPoint()&lt;/code&gt; approach eliminates the fragility of CSS selector matching. The model does not need to guess at class names or XPath expressions. It looks at the page, identifies the button or form field visually, and returns where to click. In practice, this works on sites ranging from Zillow's complex map-and-list layout to Amazon's product grid to CNN's news feed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Voice Pipeline: Gemini Live API
&lt;/h2&gt;

&lt;p&gt;AccessBrowse uses Gemini Live API (&lt;code&gt;gemini-2.5-flash-native-audio&lt;/code&gt;) for real-time bidirectional voice streaming. The connection is established using the Google GenAI SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LOCATION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LiveConnectConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;response_modalities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUDIO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TEXT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;speech_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SpeechConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;voice_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;VoiceConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;prebuilt_voice_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PrebuiltVoiceConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;voice_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Aoede&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;system_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tool_obj&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;live&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LIVE_API_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;live&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__aenter__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The session is a full bidi-stream: audio frames flow in (16kHz PCM from the microphone), and audio responses flow out (24kHz PCM from Gemini). The 24kHz output was a deliberate choice — for a product where the user experience is entirely audio-driven, voice quality matters enormously. The difference between 16kHz and 24kHz is clearly audible in consonant clarity and natural intonation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool Calling Within Live Sessions
&lt;/h3&gt;

&lt;p&gt;One of the most interesting engineering challenges was tool calling within the streaming session. When Gemini decides the user wants to browse a website, it emits a &lt;code&gt;tool_call&lt;/code&gt; in the response stream. The backend must:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Receive the tool call (with function name and arguments)&lt;/li&gt;
&lt;li&gt;Execute the tool (which may involve 10+ steps of screenshot-analyze-act)&lt;/li&gt;
&lt;li&gt;Return the tool result as a &lt;code&gt;LiveClientToolResponse&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Resume processing the audio stream as Gemini speaks the result&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This happens within a single async event loop. The &lt;code&gt;browse_web&lt;/code&gt; tool execution can take 30+ seconds (multiple screenshots, model calls, and DOM actions), and during this time the Live API connection must stay alive. We solve this with a keepalive task that sends 100ms of silence at 200ms intervals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_audio_keepalive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_live&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_live&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LiveClientRealtimeInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;media_chunks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;silence_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio/pcm;rate=16000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;)]&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Audio Pipeline: Web Audio API in a Chrome Extension
&lt;/h2&gt;

&lt;p&gt;Chrome MV3 extensions have a constraint that made the audio pipeline interesting: service workers cannot access the DOM, which means no &lt;code&gt;AudioContext&lt;/code&gt; or &lt;code&gt;getUserMedia()&lt;/code&gt; in the service worker. The solution is an &lt;strong&gt;offscreen document&lt;/strong&gt; — a hidden page created by the extension specifically for audio processing.&lt;/p&gt;

&lt;p&gt;The offscreen document handles both microphone capture and audio playback:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capture:&lt;/strong&gt; &lt;code&gt;getUserMedia()&lt;/code&gt; with 16kHz sample rate, mono, echo cancellation enabled. Audio frames are captured via a &lt;code&gt;ScriptProcessorNode&lt;/code&gt;, converted from Float32 to Int16 PCM, base64-encoded, and sent to the service worker via &lt;code&gt;chrome.runtime.sendMessage()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playback:&lt;/strong&gt; Incoming 24kHz PCM audio from Gemini is decoded from base64, converted to Float32, loaded into an &lt;code&gt;AudioBuffer&lt;/code&gt; at 24000Hz sample rate, and played through a queued source node system for smooth sequential playback.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key detail is the &lt;strong&gt;sample rate asymmetry&lt;/strong&gt;: input is 16kHz (what Gemini Live API expects for speech input) while output is 24kHz (what Gemini outputs for higher quality). The offscreen document uses two separate &lt;code&gt;AudioContext&lt;/code&gt; instances at different sample rates to handle this cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment: Cloud Run
&lt;/h2&gt;

&lt;p&gt;The FastAPI backend is deployed to Google Cloud Run using a single deploy script (&lt;code&gt;deploy.sh&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy accessbrowse &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--source&lt;/span&gt; ./backend &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--project&lt;/span&gt; &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--memory&lt;/span&gt; 1Gi &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--timeout&lt;/span&gt; 300
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cloud Run is well-suited for this workload because the backend is stateful (WebSocket sessions) but not long-lived — sessions typically last 5-10 minutes. The 300-second timeout accommodates long browsing sessions, and 1Gi of memory is sufficient for the async Python server handling up to 3 concurrent sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Vision-based interaction is more robust than DOM parsing.&lt;/strong&gt; Gemini Computer Use consistently identifies form fields, buttons, and links from screenshots — even on pages with complex layouts, overlapping elements, or minimal semantic markup. The coordinate-based approach eliminates an entire category of bugs related to CSS selector matching. We tested on Zillow (complex map + list layout), Amazon (product grids with dynamic loading), and CNN (news feeds with overlapping images and text). In all cases, the model correctly identified interactive elements from visual inspection alone. Traditional selector-based automation would have required custom adapters for each site.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool calling within a live bidi-stream requires careful async orchestration.&lt;/strong&gt; The GenAI SDK's &lt;code&gt;client.aio.live.connect()&lt;/code&gt; context manager is elegant for basic voice conversations, but executing tools that take 30+ seconds (like a multi-step browse action) within the streaming session required solving several subtle problems. The connection would time out without keepalive audio. The event loop had to simultaneously handle incoming audio, outgoing tool results, and background screenshot processing. We ended up building a pattern where &lt;code&gt;asyncio.Event&lt;/code&gt; objects coordinate state between the WebSocket handler, the Live API session, and the tool executor — three concurrent async loops that must stay synchronized. Documentation around this flow was minimal, so we relied heavily on experimentation and reading the SDK source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audio quality is a feature, not a nice-to-have.&lt;/strong&gt; For users who rely on voice as their primary interface, the difference between 16kHz and 24kHz output is immediately noticeable. Investing in the higher sample rate — and building the dual-&lt;code&gt;AudioContext&lt;/code&gt; pipeline in the offscreen document to handle the sample rate asymmetry — was one of the best decisions we made. The Aoede voice at 24kHz delivers natural intonation and clear consonants that make long browsing sessions comfortable rather than fatiguing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chrome MV3 constraints push you toward better architecture.&lt;/strong&gt; The service worker's lack of DOM access forced us to use an offscreen document for audio, which actually created a cleaner separation of concerns. Message passing between extension components is strictly JSON-serializable, which meant base64 encoding for audio and image data — not elegant, but it forced us to think carefully about data flow boundaries. The result is a four-module architecture (service worker, content script, offscreen document, sidepanel) where each component has a single responsibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Watch the demo:&lt;/strong&gt; &lt;a href="https://youtu.be/1BBzOFUTdKw" rel="noopener noreferrer"&gt;https://youtu.be/1BBzOFUTdKw&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AccessBrowse is open source: &lt;a href="https://github.com/sgharlow/accessbrowse" rel="noopener noreferrer"&gt;github.com/sgharlow/accessbrowse&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The README includes step-by-step setup instructions. You need a Google Cloud account with Vertex AI enabled, Python 3.12+, Node.js 20+, and Google Chrome.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Google Gemini Live API, Gemini Computer Use, Cloud Run, and Vertex AI for the &lt;a href="https://googleliveagentchallenge.devpost.com/" rel="noopener noreferrer"&gt;Gemini Live Agent Challenge&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;#GeminiLiveAgentChallenge&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>geminiliveagentchallenge</category>
      <category>ai</category>
    </item>
    <item>
      <title>Building a Transparent AI Pipeline: 59 Weeks of Automated Political Scoring with Claude API</title>
      <dc:creator>Steve Harlow</dc:creator>
      <pubDate>Sun, 22 Feb 2026 22:26:36 +0000</pubDate>
      <link>https://forem.com/steve_harlow_0dbc0e910b6d/building-a-transparent-ai-pipeline-59-weeks-of-automated-political-scoring-with-claude-api-4100</link>
      <guid>https://forem.com/steve_harlow_0dbc0e910b6d/building-a-transparent-ai-pipeline-59-weeks-of-automated-political-scoring-with-claude-api-4100</guid>
      <description>&lt;p&gt;I've been running an automated AI pipeline for over a year that ingests news articles, clusters them into political events, and scores each event on two independent axes. Here's how it works, what I learned, and why I made everything transparent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Political events have two dimensions that are rarely measured together:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;How much institutional damage&lt;/strong&gt; does this cause? (democratic health)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How much media attention&lt;/strong&gt; does it get? (distraction economics)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When these are wildly mismatched — high damage, low attention — something important is being missed. I built &lt;a href="https://distractionindex.org" rel="noopener noreferrer"&gt;The Distraction Index&lt;/a&gt; to detect these gaps automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;News Sources (GDELT + GNews + Google News RSS)
    ↓ every 4 hours
Ingestion Pipeline (/api/ingest)
    ↓ dedup + store
Clustering (Claude Haiku) → group articles into events
    ↓
Dual-Axis Scoring (Claude Sonnet) → Score A + Score B
    ↓
Weekly Freeze → immutable snapshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tech stack:&lt;/strong&gt; Next.js 16 (App Router), Supabase (PostgreSQL), Claude API, Vercel&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Two Models?
&lt;/h3&gt;

&lt;p&gt;Cost optimization was critical. Running everything through Sonnet would cost ~$300/month. Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Haiku&lt;/strong&gt; handles article clustering (~$0.25/1M tokens) — it groups articles by topic similarity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet&lt;/strong&gt; handles scoring (~$3/1M tokens) — it evaluates institutional impact using structured prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: &lt;strong&gt;~$30/month&lt;/strong&gt; for a production pipeline processing articles every 4 hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dual Scoring System
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Score A: Constitutional Damage (0-100)
&lt;/h3&gt;

&lt;p&gt;Seven weighted governance drivers, each scored 0-5:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Driver&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;th&gt;What it measures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Judicial Independence&lt;/td&gt;
&lt;td&gt;0.18&lt;/td&gt;
&lt;td&gt;Court stacking, ruling defiance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Press Freedom&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;Journalist targeting, access restrictions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voting Rights&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;Disenfranchisement, election interference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Environmental Policy&lt;/td&gt;
&lt;td&gt;0.12&lt;/td&gt;
&lt;td&gt;Regulatory rollbacks, enforcement gaps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Civil Liberties&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;Due process, privacy, free assembly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;International Norms&lt;/td&gt;
&lt;td&gt;0.10&lt;/td&gt;
&lt;td&gt;Treaty violations, alliance damage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fiscal Governance&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;Budget manipulation, oversight bypass&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Multiplied by severity modifiers (durability × reversibility × precedent) and mechanism/scope modifiers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Score B: Distraction/Hype (0-100)
&lt;/h3&gt;

&lt;p&gt;Two-layer model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layer 1 (55%):&lt;/strong&gt; Raw media hype — volume, social amplification, cross-platform spread, emotional framing, celebrity involvement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 2 (45%):&lt;/strong&gt; Strategic manipulation indicators — timing relative to damage events, coordinated messaging, deflection patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Layer 2 is modulated by an intentionality score (0-15). Low intentionality → Layer 2 weight drops to 10%.&lt;/p&gt;

&lt;h3&gt;
  
  
  Classification
&lt;/h3&gt;

&lt;p&gt;Events are classified by dominance margin:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Damage&lt;/strong&gt; (List A): Score A exceeds Score B by ≥10 points&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distraction&lt;/strong&gt; (List B): Score B exceeds Score A by ≥10 points&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Noise&lt;/strong&gt; (List C): Neither dominates&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Smokescreen Index
&lt;/h2&gt;

&lt;p&gt;The most interesting feature: automatic pairing of high-distraction events with concurrent high-damage events.&lt;/p&gt;

&lt;p&gt;When a B-dominant event (media spectacle) co-occurs with an A-dominant event (institutional harm) that received less coverage, the system flags it as a potential smokescreen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;210+ pairs identified&lt;/strong&gt; across 59 weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Radical Transparency
&lt;/h2&gt;

&lt;p&gt;Every scoring formula, weight, and AI prompt is published at &lt;a href="https://distractionindex.org/methodology" rel="noopener noreferrer"&gt;/methodology&lt;/a&gt;. This was a deliberate design choice — if you're scoring political events, your methodology must be auditable.&lt;/p&gt;

&lt;p&gt;Key transparency features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Immutable weekly snapshots&lt;/strong&gt; — once a week freezes, scores cannot be silently changed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Append-only corrections&lt;/strong&gt; — post-freeze corrections are timestamped and linked to the original&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Published prompts&lt;/strong&gt; — the exact Claude prompts used for scoring are documented&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open source&lt;/strong&gt; — &lt;a href="https://github.com/sgharlow/distraction" rel="noopener noreferrer"&gt;full codebase on GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Publishing your prompts is terrifying
&lt;/h3&gt;

&lt;p&gt;When your prompt templates are public, anyone can argue with your framing. That's the point — but it requires thick skin and a willingness to iterate.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Immutability prevents model drift
&lt;/h3&gt;

&lt;p&gt;Without frozen snapshots, you can't tell if score changes come from real-world changes or model updates. Immutability is essential for longitudinal analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The two-axis approach reveals patterns
&lt;/h3&gt;

&lt;p&gt;Single-dimension scoring (left/right, reliable/unreliable) misses the key insight: damage and distraction are independent variables. Some events are both. Some are neither.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Cost optimization matters for indie projects
&lt;/h3&gt;

&lt;p&gt;The Haiku-for-clustering, Sonnet-for-scoring split keeps costs at ~$30/month. Without this, the project wouldn't be sustainable as a solo effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;After 59 weeks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1,500+&lt;/strong&gt; scored events&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;11,800+&lt;/strong&gt; ingested articles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;210+&lt;/strong&gt; smokescreen pairs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;288&lt;/strong&gt; tests passing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1,071&lt;/strong&gt; pages indexed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live site:&lt;/strong&gt; &lt;a href="https://distractionindex.org" rel="noopener noreferrer"&gt;distractionindex.org&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Methodology:&lt;/strong&gt; &lt;a href="https://distractionindex.org/methodology" rel="noopener noreferrer"&gt;distractionindex.org/methodology&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source code:&lt;/strong&gt; &lt;a href="https://github.com/sgharlow/distraction" rel="noopener noreferrer"&gt;github.com/sgharlow/distraction&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love feedback on the scoring methodology. What would you weight differently? What blind spots do you see?&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>nextjs</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Building a Transparent AI Pipeline: 59 Weeks of Automated Political Scoring with Claude API</title>
      <dc:creator>Steve Harlow</dc:creator>
      <pubDate>Sun, 22 Feb 2026 07:25:44 +0000</pubDate>
      <link>https://forem.com/steve_harlow_0dbc0e910b6d/building-a-transparent-ai-pipeline-59-weeks-of-automated-political-scoring-with-claude-api-4a38</link>
      <guid>https://forem.com/steve_harlow_0dbc0e910b6d/building-a-transparent-ai-pipeline-59-weeks-of-automated-political-scoring-with-claude-api-4a38</guid>
      <description>&lt;p&gt;I've been running an automated AI pipeline for over a year that ingests news articles, clusters them into political events, and scores each event on two independent axes. Here's how it works, what I learned, and why I made everything transparent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Political events have two dimensions that are rarely measured together:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;How much institutional damage&lt;/strong&gt; does this cause? (democratic health)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How much media attention&lt;/strong&gt; does it get? (distraction economics)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When these are wildly mismatched -- high damage, low attention -- something important is being missed. I built &lt;a href="https://distractionindex.org" rel="noopener noreferrer"&gt;The Distraction Index&lt;/a&gt; to detect these gaps automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;The pipeline runs every 4 hours:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ingestion&lt;/strong&gt; - Fetch articles from 3 news sources (GDELT, GNews, Google News RSS)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dedup + Store&lt;/strong&gt; - Remove duplicates, persist to Supabase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clustering&lt;/strong&gt; - Group articles into events using Claude Haiku&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dual-Axis Scoring&lt;/strong&gt; - Score each event on two axes using Claude Sonnet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weekly Freeze&lt;/strong&gt; - Create immutable snapshot every Sunday&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Tech stack:&lt;/strong&gt; Next.js 16 (App Router), Supabase (PostgreSQL), Claude API, Vercel&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Two Models?
&lt;/h3&gt;

&lt;p&gt;Cost optimization was critical. Running everything through Sonnet would cost ~$300/month. Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Haiku&lt;/strong&gt; handles article clustering (~$0.25/1M tokens) -- it groups articles by topic similarity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet&lt;/strong&gt; handles scoring (~$3/1M tokens) -- it evaluates institutional impact using structured prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: &lt;strong&gt;~$30/month&lt;/strong&gt; for a production pipeline processing articles every 4 hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dual Scoring System
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Score A: Constitutional Damage (0-100)
&lt;/h3&gt;

&lt;p&gt;Seven weighted governance drivers, each scored 0-5:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Driver&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;th&gt;What it measures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Judicial Independence&lt;/td&gt;
&lt;td&gt;0.18&lt;/td&gt;
&lt;td&gt;Court stacking, ruling defiance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Press Freedom&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;Journalist targeting, access restrictions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voting Rights&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;Disenfranchisement, election interference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Environmental Policy&lt;/td&gt;
&lt;td&gt;0.12&lt;/td&gt;
&lt;td&gt;Regulatory rollbacks, enforcement gaps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Civil Liberties&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;Due process, privacy, free assembly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;International Norms&lt;/td&gt;
&lt;td&gt;0.10&lt;/td&gt;
&lt;td&gt;Treaty violations, alliance damage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fiscal Governance&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;Budget manipulation, oversight bypass&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Multiplied by severity modifiers (durability x reversibility x precedent) and mechanism/scope modifiers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Score B: Distraction/Hype (0-100)
&lt;/h3&gt;

&lt;p&gt;Two-layer model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layer 1 (55%):&lt;/strong&gt; Raw media hype -- volume, social amplification, cross-platform spread, emotional framing, celebrity involvement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 2 (45%):&lt;/strong&gt; Strategic manipulation indicators -- timing relative to damage events, coordinated messaging, deflection patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Layer 2 is modulated by an intentionality score (0-15). Low intentionality means Layer 2 weight drops to 10%.&lt;/p&gt;

&lt;h3&gt;
  
  
  Classification
&lt;/h3&gt;

&lt;p&gt;Events are classified by dominance margin:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Damage&lt;/strong&gt; (List A): Score A exceeds Score B by 10+ points&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distraction&lt;/strong&gt; (List B): Score B exceeds Score A by 10+ points&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Noise&lt;/strong&gt; (List C): Neither dominates&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Smokescreen Index
&lt;/h2&gt;

&lt;p&gt;The most interesting feature: automatic pairing of high-distraction events with concurrent high-damage events.&lt;/p&gt;

&lt;p&gt;When a B-dominant event (media spectacle) co-occurs with an A-dominant event (institutional harm) that received less coverage, the system flags it as a potential smokescreen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;210+ pairs identified&lt;/strong&gt; across 59 weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Radical Transparency
&lt;/h2&gt;

&lt;p&gt;Every scoring formula, weight, and AI prompt is published at &lt;a href="https://distractionindex.org/methodology" rel="noopener noreferrer"&gt;/methodology&lt;/a&gt;. This was a deliberate design choice -- if you're scoring political events, your methodology must be auditable.&lt;/p&gt;

&lt;p&gt;Key transparency features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Immutable weekly snapshots&lt;/strong&gt; -- once a week freezes, scores cannot be silently changed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Append-only corrections&lt;/strong&gt; -- post-freeze corrections are timestamped and linked to the original&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Published prompts&lt;/strong&gt; -- the exact Claude prompts used for scoring are documented&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open source&lt;/strong&gt; -- &lt;a href="https://github.com/sgharlow/distraction" rel="noopener noreferrer"&gt;full codebase on GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Publishing your prompts is terrifying
&lt;/h3&gt;

&lt;p&gt;When your prompt templates are public, anyone can argue with your framing. That's the point -- but it requires thick skin and a willingness to iterate.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Immutability prevents model drift
&lt;/h3&gt;

&lt;p&gt;Without frozen snapshots, you can't tell if score changes come from real-world changes or model updates. Immutability is essential for longitudinal analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The two-axis approach reveals patterns
&lt;/h3&gt;

&lt;p&gt;Single-dimension scoring (left/right, reliable/unreliable) misses the key insight: damage and distraction are independent variables. Some events are both. Some are neither.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Cost optimization matters for indie projects
&lt;/h3&gt;

&lt;p&gt;The Haiku-for-clustering, Sonnet-for-scoring split keeps costs at ~$30/month. Without this, the project wouldn't be sustainable as a solo effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;After 59 weeks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1,500+&lt;/strong&gt; scored events&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;11,800+&lt;/strong&gt; ingested articles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;210+&lt;/strong&gt; smokescreen pairs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;288&lt;/strong&gt; tests passing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1,071&lt;/strong&gt; pages indexed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live site:&lt;/strong&gt; &lt;a href="https://distractionindex.org" rel="noopener noreferrer"&gt;distractionindex.org&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Methodology:&lt;/strong&gt; &lt;a href="https://distractionindex.org/methodology" rel="noopener noreferrer"&gt;distractionindex.org/methodology&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source code:&lt;/strong&gt; &lt;a href="https://github.com/sgharlow/distraction" rel="noopener noreferrer"&gt;github.com/sgharlow/distraction&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love feedback on the scoring methodology. What would you weight differently? What blind spots do you see?&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>nextjs</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Why Most AI Coding Sessions Fail (And How to Fix It)</title>
      <dc:creator>Steve Harlow</dc:creator>
      <pubDate>Thu, 08 Jan 2026 01:22:10 +0000</pubDate>
      <link>https://forem.com/steve_harlow_0dbc0e910b6d/why-most-ai-coding-sessions-fail-and-how-to-fix-it-4dn1</link>
      <guid>https://forem.com/steve_harlow_0dbc0e910b6d/why-most-ai-coding-sessions-fail-and-how-to-fix-it-4dn1</guid>
      <description>&lt;h3&gt;
  
  
  The data behind AI-generated code quality—and a framework that enforces discipline
&lt;/h3&gt;




&lt;h2&gt;
  
  
  The Promise vs. Reality
&lt;/h2&gt;

&lt;p&gt;AI coding assistants are everywhere. GitHub reports &lt;a href="https://medium.com/@aminsiddique95/ai-is-writing-46-of-all-code-github-copilots-real-impact-on-15-million-developers-787d789fcfdc" rel="noopener noreferrer"&gt;15 million developers now use Copilot&lt;/a&gt;—a 400% increase in one year. Stack Overflow's 2024 survey found &lt;a href="https://www.secondtalent.com/resources/github-copilot-statistics/" rel="noopener noreferrer"&gt;63% of professional developers use AI in their workflow&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The productivity gains are real. Microsoft's research shows developers using Copilot achieve &lt;a href="https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/" rel="noopener noreferrer"&gt;26% higher productivity&lt;/a&gt; and code &lt;a href="https://medium.com/@aminsiddique95/ai-is-writing-46-of-all-code-github-copilots-real-impact-on-15-million-developers-787d789fcfdc" rel="noopener noreferrer"&gt;55% faster&lt;/a&gt; in controlled tests.&lt;/p&gt;

&lt;p&gt;But here's what the headlines don't tell you:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI-generated code creates 1.7x more issues than human-written code.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's from &lt;a href="https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report" rel="noopener noreferrer"&gt;CodeRabbit's analysis of 470 GitHub pull requests&lt;/a&gt;. The breakdown:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1.75x more logic and correctness errors&lt;/li&gt;
&lt;li&gt;1.64x more code quality and maintainability issues&lt;/li&gt;
&lt;li&gt;1.57x more security findings&lt;/li&gt;
&lt;li&gt;1.42x more performance problems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://www.qodo.ai/reports/state-of-ai-code-quality/" rel="noopener noreferrer"&gt;Google's 2024 DORA report&lt;/a&gt; found that increased AI use correlates with a &lt;strong&gt;7.2% decrease in delivery stability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And perhaps most damning: &lt;a href="https://www.qodo.ai/reports/state-of-ai-code-quality/" rel="noopener noreferrer"&gt;only 3.8% of developers&lt;/a&gt; report both low hallucination rates AND high confidence in shipping AI code without human review.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Specific Failure Patterns
&lt;/h2&gt;

&lt;p&gt;After tracking my own AI coding sessions for 6 months, I identified 13 specific ways they fail. Here are the top 5:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Mock Data That Never Dies
&lt;/h3&gt;

&lt;p&gt;AI assistants love mock data. It makes demos look great and code compile cleanly.&lt;/p&gt;

&lt;p&gt;The problem? &lt;strong&gt;Mocks survive to production.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In my logs, sessions where mock data existed past 30 minutes had an 84% chance of shipping with fake data still in place.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Interface Drift
&lt;/h3&gt;

&lt;p&gt;You start with a clean API contract. Midway through the session, the AI suggests "just a small change" to the interface.&lt;/p&gt;

&lt;p&gt;Three changes later, your frontend is broken, your tests fail, and you've lost 2 hours.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.gitclear.com/ai_assistant_code_quality_2025_research" rel="noopener noreferrer"&gt;GitClear's 2025 research&lt;/a&gt; shows code churn—changes to recently written code—has increased dramatically since AI adoption, suggesting this pattern is widespread.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Scope Creep
&lt;/h3&gt;

&lt;p&gt;"While I'm in here, let me also refactor this..."&lt;/p&gt;

&lt;p&gt;What starts as a 50-line change becomes 500 lines across 15 files. Now nothing works, and you can't isolate what broke.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The "Almost Done" Trap
&lt;/h3&gt;

&lt;p&gt;The AI reports the feature is "complete." Tests pass locally. You feel good.&lt;/p&gt;

&lt;p&gt;Then you deploy, and it breaks immediately because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Environment variables weren't configured&lt;/li&gt;
&lt;li&gt;Error handling was added but never tested&lt;/li&gt;
&lt;li&gt;A dependency was mocked that doesn't exist in production&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Security Blind Spots
&lt;/h3&gt;

&lt;p&gt;Studies show &lt;a href="https://www.netcorpsoftwaredevelopment.com/blog/ai-generated-code-statistics" rel="noopener noreferrer"&gt;48% of AI-generated code contains security vulnerabilities&lt;/a&gt;. Earlier GitHub Copilot research found &lt;a href="https://www.techradar.com/pro/security/ai-generated-code-contains-more-bugs-and-errors-than-human-output" rel="noopener noreferrer"&gt;40% of generated programs had insecure code&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The AI writes syntactically correct code. It doesn't understand your threat model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Happens
&lt;/h2&gt;

&lt;p&gt;The core issue isn't that AI is "bad at coding." It's that &lt;strong&gt;AI lacks accountability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When you ask Claude or Copilot to write code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It doesn't know if your tests actually run&lt;/li&gt;
&lt;li&gt;It can't verify its changes didn't break the build&lt;/li&gt;
&lt;li&gt;It assumes you'll catch the mocks, the drift, the scope creep&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prompt engineering helps, but prompts are suggestions. The AI can claim "I removed all mocks" while mocks still exist in your codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need enforcement, not suggestions.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Framework Solution
&lt;/h2&gt;

&lt;p&gt;I built the &lt;a href="https://github.com/sgharlow/ai-control-framework" rel="noopener noreferrer"&gt;AI Control Framework&lt;/a&gt; to enforce discipline through external scripts—validators that check the actual state of your project, not what the AI claims.&lt;/p&gt;

&lt;h3&gt;
  
  
  Contract Freezing
&lt;/h3&gt;

&lt;p&gt;At session start, interfaces (API specs, database schemas, type definitions) get SHA256-hashed.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ./freeze-contracts.sh
✓ api/openapi.yaml: sha256:a1b2c3...
✓ db/schema.sql: sha256:d4e5f6...
Contracts frozen.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Any change during the session triggers an immediate alert:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ./check-contracts.sh
✗ CONTRACT VIOLATION: api/openapi.yaml changed
Hash expected: a1b2c3...
Hash found: x7y8z9...
STOP: Submit Contract Change Request or revert.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This catches interface drift &lt;em&gt;before&lt;/em&gt; it breaks your frontend.&lt;/p&gt;

&lt;h3&gt;
  
  
  30-Minute Mock Timeout
&lt;/h3&gt;

&lt;p&gt;Mocks are allowed for the first 30 minutes—enough time to explore an approach.&lt;/p&gt;

&lt;p&gt;After 30 minutes:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ./detect-mocks.sh
⚠ MOCK TIMEOUT: 2 mocks detected after 30-minute limit
- src/api/users.ts:42 → mockUserData
- src/services/auth.ts:18 → fakeTok...
ACTION REQUIRED: Replace with real service calls.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This forces the "connect to real services" conversation early, when it's still cheap to pivot.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scope Limits
&lt;/h3&gt;

&lt;p&gt;Hard stops at 5 files changed, 200 lines added per session.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ./check-scope.sh
Files changed: 6/5 ✗
Lines added: 240/200 ✗
SCOPE EXCEEDED: Ship current work (if DRS ≥ 85) or revert.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This forces incremental, deployable chunks instead of massive, risky changesets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployability Rating Score (DRS)
&lt;/h3&gt;

&lt;p&gt;A 0-100 score calculated from 13 components:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ ./drs-calculate.sh
═══════════════════════════════════════
DEPLOYABILITY SCORE: 87/100
═══════════════════════════════════════
✓ Contract Integrity     (8/8)
✓ No Mocks               (8/8)
✓ Tests Passing          (7/7)
✓ Security Validation   (16/18)
✓ Error Handling         (4/4)
⚠ Prod Readiness        (12/15)

✅ READY TO DEPLOY (DRS ≥ 85)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;When DRS hits 85+, you &lt;strong&gt;know&lt;/strong&gt; the code is production-ready. No guessing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;After implementing this framework across 6 projects:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Time to deploy&lt;/td&gt;
&lt;td&gt;3-5 days&lt;/td&gt;
&lt;td&gt;4-6 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rework rate&lt;/td&gt;
&lt;td&gt;67%&lt;/td&gt;
&lt;td&gt;12%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Breaking changes per feature&lt;/td&gt;
&lt;td&gt;4.2&lt;/td&gt;
&lt;td&gt;0.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Works on my machine" incidents&lt;/td&gt;
&lt;td&gt;Weekly&lt;/td&gt;
&lt;td&gt;Rare&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The framework doesn't slow you down. It prevents the 3-5 day rework cycles that happen when you deploy code that isn't ready.&lt;/p&gt;




&lt;h2&gt;
  
  
  Industry Context
&lt;/h2&gt;

&lt;p&gt;The research supports this approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.qodo.ai/reports/state-of-ai-code-quality/" rel="noopener noreferrer"&gt;44% of developers who say AI degrades code quality blame context issues&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://linearb.io/blog/is-github-copilot-worth-it" rel="noopener noreferrer"&gt;Microsoft reports it takes ~11 weeks&lt;/a&gt; for developers to fully realize AI productivity gains&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.gitclear.com/ai_assistant_code_quality_2025_research" rel="noopener noreferrer"&gt;GitClear found code duplication increased 8x in 2024&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem isn't AI capability. It's &lt;strong&gt;discipline&lt;/strong&gt;—and discipline requires enforcement.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Clone and install
git clone https://github.com/sgharlow/ai-control-framework.git
./ai-control-framework/install.sh /path/to/your/project

# Run your first DRS check
cd /path/to/your/project
./ai-framework/reference/bash/drs-calculate.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The framework works with any AI assistant that can read files: Claude Code, Cursor, Copilot, Aider.&lt;/p&gt;

&lt;p&gt;It's &lt;a href="https://github.com/sgharlow/ai-control-framework/blob/main/LICENSE" rel="noopener noreferrer"&gt;MIT licensed&lt;/a&gt; and has &lt;a href="https://github.com/sgharlow/ai-control-framework" rel="noopener noreferrer"&gt;100% test coverage&lt;/a&gt; (136/136 tests passing).&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;AI coding assistants are powerful. But power without discipline leads to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Beautiful code that breaks on deploy&lt;/li&gt;
&lt;li&gt;"Almost done" sessions that need 3 more days&lt;/li&gt;
&lt;li&gt;Mock data that survives to production&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Stop hoping AI code will work. Start knowing it will deploy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/sgharlow/ai-control-framework" rel="noopener noreferrer"&gt;Try the AI Control Framework →&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;p&gt;All statistics in this article are sourced from:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/" rel="noopener noreferrer"&gt;GitHub Blog - Research on Copilot Productivity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report" rel="noopener noreferrer"&gt;CodeRabbit - State of AI vs Human Code Generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.gitclear.com/ai_assistant_code_quality_2025_research" rel="noopener noreferrer"&gt;GitClear - AI Code Quality 2025&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.qodo.ai/reports/state-of-ai-code-quality/" rel="noopener noreferrer"&gt;Qodo - State of AI Code Quality&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@aminsiddique95/ai-is-writing-46-of-all-code-github-copilots-real-impact-on-15-million-developers-787d789fcfdc" rel="noopener noreferrer"&gt;Medium - Copilot's Impact on 15M Developers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://linearb.io/blog/is-github-copilot-worth-it" rel="noopener noreferrer"&gt;LinearB - Is GitHub Copilot Worth It?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.techradar.com/pro/security/ai-generated-code-contains-more-bugs-and-errors-than-human-output" rel="noopener noreferrer"&gt;TechRadar - AI Code Security Issues&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Have you struggled with AI coding assistant reliability? Let me know in the comments what patterns you've seen.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>devops</category>
      <category>programming</category>
    </item>
    <item>
      <title>Kiroween Hackathon Submitted!</title>
      <dc:creator>Steve Harlow</dc:creator>
      <pubDate>Fri, 05 Dec 2025 19:12:43 +0000</pubDate>
      <link>https://forem.com/steve_harlow_0dbc0e910b6d/kiroween-hackathon-submitted-nhi</link>
      <guid>https://forem.com/steve_harlow_0dbc0e910b6d/kiroween-hackathon-submitted-nhi</guid>
      <description>&lt;p&gt;Just making a quick post about my latest hackathon entry.&lt;/p&gt;

&lt;p&gt;This repo is a remaking of a legacy scrum planning webapp that I used Kiro to migrate the connection infrastructure from Web RTC, which was unreliable to AWS serverless tech.  &lt;a href="https://github.com/sgharlow/scrum-web-reborn" rel="noopener noreferrer"&gt;https://github.com/sgharlow/scrum-web-reborn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Transformation Story&lt;br&gt;
From 50% to 99.5% Connectivity: A 99x Improvement&lt;/p&gt;

&lt;p&gt;The original Scrum Facilitator app relied on peer-to-peer (P2P) WebRTC connections, where users connected directly to each other through their browsers. While this approach worked in ideal conditions, it failed catastrophically in real-world scenarios—achieving only a 50% connection success rate. The core problem was NAT traversal: corporate firewalls blocked UDP traffic, symmetric NAT prevented direct connections, mobile carrier-grade NAT made connections impossible, and free TURN relay servers were unreliable and rate-limited. When connections did work, they were fragile—if the facilitator's browser closed or their network hiccupped, the entire session's data was lost since there was no server to persist state. Scrum Reborn solves this by replacing the entire P2P architecture with AWS AppSync (managed GraphQL with WebSocket subscriptions) and DynamoDB. Now, all communication flows through HTTPS (port 443), which works everywhere—no NAT traversal needed, no firewall issues, no TURN servers. AppSync provides guaranteed message delivery with automatic reconnection, while DynamoDB serves as the single source of truth with ACID transactions. The result: 99.5%+ connectivity, sub-250ms latency, and zero data loss. This represents a 99x improvement in reliability (99.5% ÷ 50% = 1.99), transforming an unreliable prototype into a production-ready collaboration platform that distributed teams can actually depend on.&lt;/p&gt;

&lt;p&gt;Features&lt;br&gt;
Planning Poker: Real-time story estimation with Fibonacci voting&lt;br&gt;
Retrospectives: Collaborative retro boards with voting&lt;br&gt;
Presence Tracking: Always know who's in the room&lt;br&gt;
99.5%+ Connectivity: No more NAT traversal issues&lt;br&gt;
Sub-250ms Latency: Updates feel instant&lt;br&gt;
Serverless Architecture: Scales automatically with AWS&lt;br&gt;
Architecture&lt;br&gt;
Frontend: React 19 + TypeScript + Vite&lt;br&gt;
Backend: AWS AppSync (GraphQL) + Lambda + DynamoDB&lt;br&gt;
Auth: AWS Cognito User Pools&lt;br&gt;
Infrastructure: AWS CDK (TypeScript)&lt;br&gt;
Real-time: GraphQL Subscriptions over WebSocket&lt;/p&gt;

</description>
      <category>hookedonkiro</category>
      <category>kirodotdev</category>
      <category>devpost</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
