<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Koustubh</title>
    <description>The latest articles on Forem by Koustubh (@koustubh).</description>
    <link>https://forem.com/koustubh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3594821%2F51594cd8-8b43-4605-9cf9-99e4bf0e4676.png</url>
      <title>Forem: Koustubh</title>
      <link>https://forem.com/koustubh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/koustubh"/>
    <language>en</language>
    <item>
      <title>What OpenClaw Teaches Us About Personal AI Security</title>
      <dc:creator>Koustubh</dc:creator>
      <pubDate>Sun, 08 Feb 2026 03:58:13 +0000</pubDate>
      <link>https://forem.com/koustubh/why-your-ai-assistant-should-run-at-home-lessons-from-openclaw-2o1n</link>
      <guid>https://forem.com/koustubh/why-your-ai-assistant-should-run-at-home-lessons-from-openclaw-2o1n</guid>
      <description>&lt;p&gt;OpenClaw and gharasathi are both local-first AI assistants. Both run on your own hardware. Both handle personal data. In February 2026, OpenClaw had a very bad month — and the lessons aren't what you might expect.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Context: gharasathi's Setup
&lt;/h2&gt;

&lt;p&gt;gharasathi runs on a ByteNUC mini PC using Talos Linux — an immutable, minimal OS built for Kubernetes. No SSH. No shell. No package manager.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRCCiAgICBzdWJncmFwaCBCeXRlTlVDWyJCeXRlTlVDICgkMjAwIMK3IDZHQiBSQU0gwrcgMlRCKSJdCiAgICAgICAgc3ViZ3JhcGggVGFsb3NbIlRhbG9zIExpbnV4IChpbW11dGFibGUpIl0KICAgICAgICAgICAgc3ViZ3JhcGggSzhzWyJLdWJlcm5ldGVzIl0KICAgICAgICAgICAgICAgIE5TMVsiYWFwbGEtbWFoaXRpc2F0aGE8YnIvPk5lbzRqIl0KICAgICAgICAgICAgICAgIE5TMlsiYWFwbGEtZGhhbjxici8-RmluYW5jZSAoR28pIl0KICAgICAgICAgICAgICAgIE5TM1siYWFwbGEtaHVzaGFyPGJyLz5MTE0gKFB5dGhvbiArIE9sbGFtYSkiXQogICAgICAgICAgICAgICAgTlM0WyJnaGFyYXNhdGhpLXdlYjxici8-UmVhY3QiXQogICAgICAgICAgICAgICAgTlM1WyJzdG9yYWdlPGJyLz5NaW5JTyJdCiAgICAgICAgICAgIGVuZAogICAgICAgIGVuZAogICAgZW5kCiAgICBQaG9uZVsiaVBob25lIl0gLS0-fCJMQU4gb25seSJ8IEs4cwogICAgTGFwdG9wWyJCcm93c2VyIl0gLS0-fCJMQU4gb25seSJ8IEs4cw%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRCCiAgICBzdWJncmFwaCBCeXRlTlVDWyJCeXRlTlVDICgkMjAwIMK3IDZHQiBSQU0gwrcgMlRCKSJdCiAgICAgICAgc3ViZ3JhcGggVGFsb3NbIlRhbG9zIExpbnV4IChpbW11dGFibGUpIl0KICAgICAgICAgICAgc3ViZ3JhcGggSzhzWyJLdWJlcm5ldGVzIl0KICAgICAgICAgICAgICAgIE5TMVsiYWFwbGEtbWFoaXRpc2F0aGE8YnIvPk5lbzRqIl0KICAgICAgICAgICAgICAgIE5TMlsiYWFwbGEtZGhhbjxici8-RmluYW5jZSAoR28pIl0KICAgICAgICAgICAgICAgIE5TM1siYWFwbGEtaHVzaGFyPGJyLz5MTE0gKFB5dGhvbiArIE9sbGFtYSkiXQogICAgICAgICAgICAgICAgTlM0WyJnaGFyYXNhdGhpLXdlYjxici8-UmVhY3QiXQogICAgICAgICAgICAgICAgTlM1WyJzdG9yYWdlPGJyLz5NaW5JTyJdCiAgICAgICAgICAgIGVuZAogICAgICAgIGVuZAogICAgZW5kCiAgICBQaG9uZVsiaVBob25lIl0gLS0-fCJMQU4gb25seSJ8IEs4cwogICAgTGFwdG9wWyJCcm93c2VyIl0gLS0-fCJMQU4gb25seSJ8IEs4cw%3D%3D" alt="diagram" width="436" height="904"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LAN only. No port forwarding. No public IP. The only way to reach these services is to be on my home Wi-Fi.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happened to OpenClaw
&lt;/h2&gt;

&lt;p&gt;OpenClaw (formerly Clawdbot) is an open-source AI agent that went viral in early 2026 — 150K+ GitHub stars. It's &lt;strong&gt;also local-first&lt;/strong&gt;: you install it on your machine, it stores data locally, and it connects to LLMs for task automation.&lt;/p&gt;

&lt;p&gt;But OpenClaw is far more ambitious than gharasathi. It can execute shell commands, control browsers, send emails, and automate multi-step workflows. It integrates with WhatsApp, Telegram, and Slack. It has a community marketplace (ClawHub) for third-party skills. gharasathi just queries a database and explains the results in natural language.&lt;/p&gt;

&lt;p&gt;That difference in scope turned out to matter a lot. In February 2026, multiple security teams published findings within days of each other:&lt;/p&gt;

&lt;h3&gt;
  
  
  CVE-2026-25253: One-Click RCE (CVSS 8.8)
&lt;/h3&gt;

&lt;p&gt;A browser-based attack that let attackers hijack any OpenClaw instance — &lt;strong&gt;even ones running only on localhost&lt;/strong&gt;. A user visits a crafted webpage, JavaScript steals the gateway token via WebSocket, and the attacker gains full operator access: disable security features, escape Docker containers, execute arbitrary commands on the host. (&lt;a href="https://thehackernews.com/2026/02/openclaw-bug-enables-one-click-remote.html" rel="noopener noreferrer"&gt;The Hacker News&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;This is worth pausing on. OpenClaw was running locally. It was listening on localhost. And it was still compromised — because the victim's own browser initiated the connection. &lt;strong&gt;"Local" alone doesn't mean "secure."&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Misconfigured Instances Leaking Data
&lt;/h3&gt;

&lt;p&gt;Users who exposed their OpenClaw instances to the internet — against best practices — were found leaking API keys, chat histories, and credentials. Some had zero authentication. (&lt;a href="https://www.trendmicro.com/en_us/research/26/b/what-openclaw-reveals-about-agentic-assistants.html" rel="noopener noreferrer"&gt;Trend Micro&lt;/a&gt;)&lt;/p&gt;

&lt;h3&gt;
  
  
  341 Malicious Skills in ClawHub
&lt;/h3&gt;

&lt;p&gt;Security researchers found &lt;strong&gt;341 malicious packages&lt;/strong&gt; on ClawHub, OpenClaw's community marketplace. These impersonated legitimate tools but contained Atomic Stealer malware and ClawHavoc C2 implants — harvesting SSH keys, browser passwords, crypto wallet keys, and API tokens. (&lt;a href="https://thehackernews.com/2026/02/researchers-find-341-malicious-clawhub.html" rel="noopener noreferrer"&gt;The Hacker News&lt;/a&gt;, &lt;a href="https://www.theregister.com/2026/02/05/openclaw_skills_marketplace_leaky_security/" rel="noopener noreferrer"&gt;The Register&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Comparison
&lt;/h2&gt;

&lt;p&gt;It would be easy to say "gharasathi is local, therefore safe" — but OpenClaw is local too. The RCE vulnerability worked on localhost. "Runs on your machine" is not a security strategy.&lt;/p&gt;

&lt;p&gt;The actual differences that matter are about &lt;strong&gt;scope&lt;/strong&gt; and &lt;strong&gt;attack surface&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;OpenClaw&lt;/th&gt;
&lt;th&gt;gharasathi&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;What can the AI do?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Execute code, send emails, control browsers, shell access&lt;/td&gt;
&lt;td&gt;Query Neo4j (read-only), explain results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;External integrations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;WhatsApp, Telegram, Slack, browser automation&lt;/td&gt;
&lt;td&gt;None — LAN-only chat interface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Plugin/skill marketplace&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ClawHub (2,800+ community skills)&lt;/td&gt;
&lt;td&gt;None — all code in monorepo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloud API keys&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GPT-4, Claude, Gemini keys in local config&lt;/td&gt;
&lt;td&gt;None — local Ollama only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Blast radius if compromised&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full host access, all connected services&lt;/td&gt;
&lt;td&gt;Read access to household data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The RCE vulnerability (CVE-2026-25253) is a browser-based WebSocket attack. Could something similar affect gharasathi? Honestly — yes, in principle. Any service with a web interface is potentially vulnerable to browser-based attacks. I won't pretend otherwise.&lt;/p&gt;

&lt;p&gt;But the &lt;strong&gt;blast radius&lt;/strong&gt; is fundamentally different. If someone compromised OpenClaw, they got shell access, email sending, browser control, and every API key stored locally. If someone compromised gharasathi's chat interface, they'd get... read-only access to my grocery spending and photo metadata. The LLM can't execute code. It can't send messages. It can't modify data. It queries Neo4j and formats the response.&lt;/p&gt;

&lt;h3&gt;
  
  
  This Is a Trade-Off, Not a Win
&lt;/h3&gt;

&lt;p&gt;I want to be clear: OpenClaw's broader scope is exactly what makes it useful. People use it to automate real workflows — managing emails, booking flights, controlling browsers, orchestrating multi-step tasks across services. gharasathi can't do any of that. It answers questions about household data. That's it.&lt;/p&gt;

&lt;p&gt;OpenClaw with 150K+ GitHub stars solves problems that gharasathi doesn't even attempt. The shell access, the messaging integrations, the plugin marketplace — those features exist because users need them. Removing them isn't a security strategy anyone building a general-purpose AI agent can adopt.&lt;/p&gt;

&lt;p&gt;gharasathi gets away with a narrow scope because it was designed for a narrow purpose: surface household data in natural language. That's a fundamentally different ambition from what OpenClaw is trying to do, and comparing them on security alone misses the point. The security comparison is only meaningful if you understand that these tools serve very different needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Learned
&lt;/h2&gt;

&lt;p&gt;Three takeaways from watching OpenClaw's February:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. "Local-first" is necessary but not sufficient.&lt;/strong&gt; Running on your own hardware avoids cloud data exposure, but it doesn't protect against browser-based attacks or supply chain compromises. Don't confuse deployment model with security model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Scope is your best defense.&lt;/strong&gt; The most effective security decision in gharasathi isn't the network topology — it's that the LLM can only read data and explain it. No code execution. No external integrations. No plugins. Every capability you add is attack surface you have to defend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Marketplaces need guardrails.&lt;/strong&gt; Plugin ecosystems are how tools like OpenClaw scale — 2,800 community skills is an incredible achievement. But 341 of them were malicious, and they ran with the same permissions as the core agent. The lesson isn't "don't have a marketplace" — it's that plugins touching personal data need sandboxing, permission scoping, and review processes that match the sensitivity of what they can access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Stands
&lt;/h2&gt;

&lt;p&gt;gharasathi is still an MVP. The architecture works: graph database for structured household data, local LLM for natural language, Kubernetes for orchestration, all on a $200 mini PC.&lt;/p&gt;

&lt;p&gt;What's proven:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Neo4j handles household data relationships elegantly&lt;/li&gt;
&lt;li&gt;A small LLM (phi3:mini) works for structured query-and-explain&lt;/li&gt;
&lt;li&gt;K8s resource limits keep 5 services running in 6GB RAM&lt;/li&gt;
&lt;li&gt;Minimal scope limits blast radius even if something goes wrong&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What's not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The LLM can't do real analysis (needs hardware upgrade for larger models)&lt;/li&gt;
&lt;li&gt;Photo processing isn't implemented yet&lt;/li&gt;
&lt;li&gt;It's a one-household system, not a product&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Idea Scales
&lt;/h3&gt;

&lt;p&gt;The current MVP handles finances and photos, but the core idea — put your household data in a graph, give it a natural language interface — extends to almost anything: health records, home energy usage, kids' school schedules, vehicle maintenance, meal planning. Each new data domain is just another node type in Neo4j with relationships to what's already there. The query patterns stay the same: start at a node, walk relationships, explain the result.&lt;/p&gt;

&lt;p&gt;The interface scales too. Right now it's chat. But the same LangGraph pipeline that answers a typed question could answer a spoken one — swap the text input for speech-to-text and the response for text-to-speech. The architecture doesn't change. The hardware constraints are the real bottleneck for now, not the design.&lt;/p&gt;

&lt;p&gt;I'm not open-sourcing gharasathi. The codebase contains patterns specific to my household's financial accounts and data structure. Releasing it would require sanitizing all of that, and "privacy-first project accidentally leaks developer's banking patterns" is exactly the kind of irony I'd rather avoid.&lt;/p&gt;

&lt;p&gt;Your most personal data — your finances, your photos, your family's memories — deserves to stay personal. Not discoverable on Shodan. Not accessible through a marketplace plugin you didn't audit. Just yours, on a box in your garage.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is Part 3 of a 3-part series. Use the series navigation above to read Part 1 (Architecture &amp;amp; Neo4j) and Part 2 (LLM Model Selection).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.trendmicro.com/en_us/research/26/b/what-openclaw-reveals-about-agentic-assistants.html" rel="noopener noreferrer"&gt;Trend Micro: What OpenClaw Reveals About Agentic Assistants&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thehackernews.com/2026/02/openclaw-bug-enables-one-click-remote.html" rel="noopener noreferrer"&gt;The Hacker News: OpenClaw Bug Enables One-Click RCE&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.theregister.com/2026/02/02/openclaw_security_issues/" rel="noopener noreferrer"&gt;The Register: OpenClaw Ecosystem Security Issues&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.theregister.com/2026/02/05/openclaw_skills_marketplace_leaky_security/" rel="noopener noreferrer"&gt;The Register: Easy to Backdoor OpenClaw Skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thehackernews.com/2026/02/researchers-find-341-malicious-clawhub.html" rel="noopener noreferrer"&gt;The Hacker News: 341 Malicious ClawHub Skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.crowdstrike.com/en-us/blog/what-security-teams-need-to-know-about-openclaw-ai-super-agent/" rel="noopener noreferrer"&gt;CrowdStrike: What Security Teams Need to Know About OpenClaw&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>selfhosted</category>
      <category>security</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Running an LLM on 6GB RAM — Model Selection for Edge AI</title>
      <dc:creator>Koustubh</dc:creator>
      <pubDate>Sun, 08 Feb 2026 03:58:12 +0000</pubDate>
      <link>https://forem.com/koustubh/running-an-llm-on-6gb-ram-model-selection-for-edge-ai-3chi</link>
      <guid>https://forem.com/koustubh/running-an-llm-on-6gb-ram-model-selection-for-edge-ai-3chi</guid>
      <description>&lt;p&gt;The hardest part of building a self-hosted AI isn't the architecture — it's choosing the right LLM for your actual use case &lt;em&gt;and&lt;/em&gt; your actual hardware. A multimodal model that understands images is pointless when your task is explaining bank transactions. A model that tops benchmarks is useless if it takes 20 seconds to respond on your CPU. Here's what happened when I learned both lessons.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hardware
&lt;/h2&gt;

&lt;p&gt;The entire gharasathi system runs on a ByteNUC mini PC:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Spec&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Intel (no discrete GPU)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6GB total&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Disk&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2TB SSD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Talos Linux (immutable, K8s-native)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~$200&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;6GB RAM. That's it. And it needs to run the OS, Kubernetes, Neo4j, four microservices, &lt;em&gt;and&lt;/em&gt; an LLM.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Memory Budget
&lt;/h2&gt;

&lt;p&gt;Every megabyte matters. Here's how the 6GB is carved up, based on actual K8s resource limits from our deployment manifests:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FcGllIHRpdGxlIDZHQiBSQU0gQnVkZ2V0CiAgICAiT2xsYW1hIChMTE0pIiA6IDMKICAgICJOZW80aiIgOiAxCiAgICAiRmFzdEFQSSArIFB5dGhvbiIgOiAxCiAgICAiYWFwbGEtZGhhbiAoR28pIiA6IDAuNQogICAgIks4cyArIE9TICsgb3RoZXIiIDogMC41" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FcGllIHRpdGxlIDZHQiBSQU0gQnVkZ2V0CiAgICAiT2xsYW1hIChMTE0pIiA6IDMKICAgICJOZW80aiIgOiAxCiAgICAiRmFzdEFQSSArIFB5dGhvbiIgOiAxCiAgICAiYWFwbGEtZGhhbiAoR28pIiA6IDAuNQogICAgIks4cyArIE9TICsgb3RoZXIiIDogMC41" alt="diagram" width="645" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;K8s requests&lt;/th&gt;
&lt;th&gt;K8s limits&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ollama sidecar&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2Gi&lt;/td&gt;
&lt;td&gt;3Gi&lt;/td&gt;
&lt;td&gt;LLM inference engine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FastAPI (aapla-hushar)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;512Mi&lt;/td&gt;
&lt;td&gt;1Gi&lt;/td&gt;
&lt;td&gt;Python service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Neo4j&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;512Mi&lt;/td&gt;
&lt;td&gt;1Gi&lt;/td&gt;
&lt;td&gt;Graph database&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;aapla-dhan&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;256Mi&lt;/td&gt;
&lt;td&gt;512Mi&lt;/td&gt;
&lt;td&gt;Go finance service&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The LLM service gets ~3GB total (Ollama + FastAPI together in one K8s pod via sidecar pattern). After the Python runtime, FastAPI, and LangChain dependencies eat ~500MB, the actual model has roughly &lt;strong&gt;2.5GB&lt;/strong&gt; to work with.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Model Selection Journey
&lt;/h2&gt;

&lt;p&gt;I evaluated 15+ open-source models across four tiers. The question: what's the best LLM you can run in ~2.5GB?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBTdGFydFsiQnVkZ2V0OiB-Mi41R0IgZm9yIG1vZGVsIl0gLS0-IFExeyJGaXRzIGluIG1lbW9yeT8ifQogICAgUTEgLS0-fE5vfCBSZWplY3RbIlBoaS00IDE0QiDCtyBRd2VuMzo4Yjxici8-TWlzdHJhbCA3QiDCtyBMbGFtYSAzLjMiXQogICAgUTEgLS0-fFllc3wgUTJ7IkNvbnRleHQgd2luZG93PyJ9CiAgICBRMiAtLT58IjwgMzJLInwgU21hbGxbIlNtb2xMTTIgwrcgR2VtbWEgMi0yQiJdCiAgICBRMiAtLT58IjMySysifCBRM3siUmVhc29uaW5nIHF1YWxpdHk_In0KICAgIFEzIC0tPnxCYXNpY3wgTWlkWyJRd2VuMi41LTEuNUIiXQogICAgUTMgLS0-fFN0cm9uZ3wgV2lubmVyWyJUb3AgY2FuZGlkYXRlcyJd" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBTdGFydFsiQnVkZ2V0OiB-Mi41R0IgZm9yIG1vZGVsIl0gLS0-IFExeyJGaXRzIGluIG1lbW9yeT8ifQogICAgUTEgLS0-fE5vfCBSZWplY3RbIlBoaS00IDE0QiDCtyBRd2VuMzo4Yjxici8-TWlzdHJhbCA3QiDCtyBMbGFtYSAzLjMiXQogICAgUTEgLS0-fFllc3wgUTJ7IkNvbnRleHQgd2luZG93PyJ9CiAgICBRMiAtLT58IjwgMzJLInwgU21hbGxbIlNtb2xMTTIgwrcgR2VtbWEgMi0yQiJdCiAgICBRMiAtLT58IjMySysifCBRM3siUmVhc29uaW5nIHF1YWxpdHk_In0KICAgIFEzIC0tPnxCYXNpY3wgTWlkWyJRd2VuMi41LTEuNUIiXQogICAgUTMgLS0-fFN0cm9uZ3wgV2lubmVyWyJUb3AgY2FuZGlkYXRlcyJd" alt="diagram" width="693" height="930"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The top 5 candidates that passed the filter:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Key Strength&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen3:4b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.5GB&lt;/td&gt;
&lt;td&gt;256K&lt;/td&gt;
&lt;td&gt;Rivals 72B performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Phi-3-mini&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.2GB&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;Strong reasoning (Microsoft)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Llama 3.2-3B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.0GB&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;Instruction following&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen2.5-3B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.9GB&lt;/td&gt;
&lt;td&gt;32K&lt;/td&gt;
&lt;td&gt;Good coding/math&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemma3-4B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.3GB&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;Multimodal (images)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The research was clear: &lt;strong&gt;Qwen3:4b&lt;/strong&gt; was the winner. 2.5GB, 256K context window, benchmarks rivaling 72B models. A no-brainer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reality Disagreed
&lt;/h2&gt;

&lt;p&gt;I deployed Qwen3:4b to the ByteNUC. It fit in memory. It loaded fine. Then I asked it a question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 1: Painfully slow.&lt;/strong&gt; On a CPU-only Intel NUC with 6GB RAM, Qwen3:4b took 15-20+ seconds for simple responses. For a household chat assistant, that's unusable. You ask "how much did I spend on groceries?" and wait long enough to check the bank app yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 2: Hallucinations in Chinese.&lt;/strong&gt; Qwen3 is a multilingual model supporting 100+ languages. On constrained hardware with limited context, the model would occasionally bleed into Chinese mid-response. Great for a multilingual product. Confusing for a household assistant that only needs English.&lt;/p&gt;

&lt;p&gt;The benchmarks didn't lie — Qwen3:4b is a remarkable model. But benchmarks run on A100 GPUs with 80GB VRAM, not on a $200 mini PC with shared system RAM and no GPU.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Downgrade
&lt;/h2&gt;

&lt;p&gt;I switched to &lt;strong&gt;phi3:mini&lt;/strong&gt; (Microsoft's Phi-3 Mini, 3.8B parameters, 2.2GB):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Qwen3:4b&lt;/th&gt;
&lt;th&gt;phi3:mini&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.5GB&lt;/td&gt;
&lt;td&gt;2.2GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;256K&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed on ByteNUC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;15-20s+&lt;/td&gt;
&lt;td&gt;3-7s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Language stability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Occasional Chinese bleed&lt;/td&gt;
&lt;td&gt;Stable English&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Benchmark ranking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Higher&lt;/td&gt;
&lt;td&gt;Lower&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Actually usable?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;phi3:mini is less capable on paper. Smaller context window. Lower benchmark scores. But it responds in seconds, stays in English, and gives coherent answers about household data. That's what matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson: benchmarks ≠ real-world performance on constrained hardware.&lt;/strong&gt; Test on your actual target device, not on specs.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Note on Choosing Models for Constrained Hardware
&lt;/h3&gt;

&lt;p&gt;A few things I wish I'd prioritized earlier in the evaluation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quantized models matter.&lt;/strong&gt; On hardware like this, quantized variants (Q4_0, Q4_K_M) are far more practical than full-precision weights. They reduce memory footprint and improve inference speed with minimal quality loss. All the Ollama models above are already quantized — that's how a 4B parameter model fits in 2.5GB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Match the model to the task, not the leaderboard.&lt;/strong&gt; A multimodal model that understands images is wasted when your use case is explaining bank transactions. A model with a 256K context window is overkill when your prompts are 200 tokens. Pick for what you actually need.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU inference is slow — plan for it.&lt;/strong&gt; Even with phi3:mini, responses take around &lt;strong&gt;30 seconds&lt;/strong&gt; end-to-end on the ByteNUC. That's liveable for a household assistant you check a few times a day, but it rules out anything conversational or real-time. If you need snappy responses, you need a GPU or a smaller model.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why the LLM Barely Matters
&lt;/h2&gt;

&lt;p&gt;Here's the counterintuitive insight that makes a smaller model viable: &lt;strong&gt;the LLM is just the natural language interface&lt;/strong&gt;. Neo4j does the heavy lifting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBVc2VyWyInSG93IG11Y2ggZGlkIEk8YnIvPnNwZW5kIG9uIGdyb2Nlcmllcz8nIl0gLS0-IENsYXNzaWZ5WyJLZXl3b3JkPGJyLz5jbGFzc2lmaWVyIl0KICAgIENsYXNzaWZ5IC0tPnxmaW5hbmNlfCBUb29sWyJGaW5hbmNlIHRvb2xzPGJyLz4oZGV0ZXJtaW5pc3RpYykiXQogICAgVG9vbCAtLT58Q3lwaGVyIHF1ZXJ5fCBEQlsoIk5lbzRqIildCiAgICBEQiAtLT58c3RydWN0dXJlZCBkYXRhfCBMTE1bInBoaTM6bWluaTxici8-KGV4cGxhaW4gcmVzdWx0cykiXQogICAgTExNIC0tPiBVc2VyMlsiJ1lvdSBzcGVudCAkODQ3IG9uPGJyLz5ncm9jZXJpZXMgdGhpcyBtb250aCciXQ%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBVc2VyWyInSG93IG11Y2ggZGlkIEk8YnIvPnNwZW5kIG9uIGdyb2Nlcmllcz8nIl0gLS0-IENsYXNzaWZ5WyJLZXl3b3JkPGJyLz5jbGFzc2lmaWVyIl0KICAgIENsYXNzaWZ5IC0tPnxmaW5hbmNlfCBUb29sWyJGaW5hbmNlIHRvb2xzPGJyLz4oZGV0ZXJtaW5pc3RpYykiXQogICAgVG9vbCAtLT58Q3lwaGVyIHF1ZXJ5fCBEQlsoIk5lbzRqIildCiAgICBEQiAtLT58c3RydWN0dXJlZCBkYXRhfCBMTE1bInBoaTM6bWluaTxici8-KGV4cGxhaW4gcmVzdWx0cykiXQogICAgTExNIC0tPiBVc2VyMlsiJ1lvdSBzcGVudCAkODQ3IG9uPGJyLz5ncm9jZXJpZXMgdGhpcyBtb250aCciXQ%3D%3D" alt="diagram" width="1448" height="94"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The architecture follows a &lt;strong&gt;"tools first, LLM explains"&lt;/strong&gt; pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Intent classification&lt;/strong&gt; is deterministic keyword matching — no LLM needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data retrieval&lt;/strong&gt; uses specialized tools that run Cypher queries against Neo4j&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The LLM only translates&lt;/strong&gt; between natural language and presenting structured results&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The LLM never calculates, never queries, never decides what data to fetch. It receives pre-fetched results and writes a human-readable response. For that job, phi3:mini is more than sufficient.&lt;/p&gt;

&lt;p&gt;From the actual code (&lt;code&gt;aapla-hushar/src/agents/graph.py&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Deterministic routing — no LLM involved
&lt;/span&gt;&lt;span class="n"&gt;finance_keywords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spend&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expense&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;budget&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transaction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;last_message&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;finance_keywords&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;intent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;finance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Tools fetch data first, then LLM explains
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_spending_summary&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Cypher query to Neo4j
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a financial assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ONLY interpret this data:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Ollama sidecar runs alongside FastAPI in the same K8s pod — localhost communication, zero network latency between the Python service and the LLM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future: Better LLM Requires Better Hardware
&lt;/h2&gt;

&lt;p&gt;The current setup works for structured queries against Neo4j. But for actual data &lt;em&gt;analysis&lt;/em&gt; — "what trends do you see in my spending?" or "suggest ways to save money" — the model needs to reason over larger contexts with more capability.&lt;/p&gt;

&lt;p&gt;That upgrade means upgrading the ByteNUC's RAM first. A 7B or 8B model needs 8-16GB total system memory. Until then, phi3:mini handles the structured query-and-explain pattern well.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The model runs. The queries work. But is "runs locally" enough to keep your data safe?&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;Part 3&lt;/strong&gt;, I look at what happened to OpenClaw — another local-first AI assistant — and why the real security lesson isn't about where your AI runs, but what it's allowed to do.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is Part 3 of a 3-part series. Use the series navigation above to read Part 1 (Architecture &amp;amp; Neo4j) and Part 3 (What OpenClaw Teaches Us).&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>selfhosted</category>
      <category>llm</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>gharasathi (घरासाठी) — A Privacy-First Household AI Running on a $200 Mini PC</title>
      <dc:creator>Koustubh</dc:creator>
      <pubDate>Sun, 08 Feb 2026 03:58:11 +0000</pubDate>
      <link>https://forem.com/koustubh/gharasathi-ghraasaatthii-a-privacy-first-household-ai-running-on-a-200-mini-pc-4hnm</link>
      <guid>https://forem.com/koustubh/gharasathi-ghraasaatthii-a-privacy-first-household-ai-running-on-a-200-mini-pc-4hnm</guid>
      <description>&lt;p&gt;gharasathi ("for home" in Marathi) is a privacy-first household AI that connects finances, photos, and memories — running entirely on a $200 mini PC in my garage. No cloud. No subscriptions. No data leaving the house.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Household data is scattered everywhere. Bank transactions in three different apps. Photos split across iCloud and Google Photos. Bills in s or individual company portals. Memories in your head.&lt;/p&gt;

&lt;p&gt;Every "smart" assistant that promises to unify this — Alexa, Google Home, ChatGPT — requires shipping your most intimate data to someone else's servers. Your spending patterns. Your family photos. Your location history. All flowing through infrastructure you don't control, governed by privacy policies that change quarterly.&lt;/p&gt;

&lt;p&gt;I wanted something different: a private AI that ties all household data together and runs entirely on my home network. Something I could ask "How much did we spend during the Christmas trip?" and get an answer by traversing actual data, not hallucinating one. Something where the answer to "where is my data?" is always "in the living room."&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The system is a set of microservices on Kubernetes. All services are named in Marathi — the language spoken in western India.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBDbGllbnRzCiAgICAgICAgaU9TWyJnaGFyYXNhdGhpLWlvczxici8-KFN3aWZ0KSJdCiAgICAgICAgV2ViWyJnaGFyYXNhdGhpLXdlYjxici8-KFJlYWN0KSJdCiAgICBlbmQKICAgIHN1YmdyYXBoIEs4c1siVGFsb3MgSzhzIMK3IEJ5dGVOVUMiXQogICAgICAgIExMTVsiYWFwbGEtaHVzaGFyPGJyLz5QeXRob24gwrcgTGFuZ0dyYXBoIl0KICAgICAgICBGaW5hbmNlWyJhYXBsYS1kaGFuPGJyLz5HbyJdCiAgICAgICAgTWVtb3JpZXNbImFhcGx5YS1hdGh2YW5pPGJyLz5HbyJdCiAgICAgICAgREJbKCJOZW80aiIpXQogICAgICAgIE9CSlsoIk1pbklPIildCiAgICBlbmQKICAgIGlPUyAmIFdlYiA8LS0-IExMTQogICAgTExNIC0tPiBEQgogICAgRmluYW5jZSAtLT4gREIKICAgIE1lbW9yaWVzIC0tPiBEQgogICAgTWVtb3JpZXMgLS0-IE9CSgogICAgRmluYW5jZSAtLi0-fENEUiBBUEl8IEJhbmtbIkJhbmsgQVBJcyJd" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBDbGllbnRzCiAgICAgICAgaU9TWyJnaGFyYXNhdGhpLWlvczxici8-KFN3aWZ0KSJdCiAgICAgICAgV2ViWyJnaGFyYXNhdGhpLXdlYjxici8-KFJlYWN0KSJdCiAgICBlbmQKICAgIHN1YmdyYXBoIEs4c1siVGFsb3MgSzhzIMK3IEJ5dGVOVUMiXQogICAgICAgIExMTVsiYWFwbGEtaHVzaGFyPGJyLz5QeXRob24gwrcgTGFuZ0dyYXBoIl0KICAgICAgICBGaW5hbmNlWyJhYXBsYS1kaGFuPGJyLz5HbyJdCiAgICAgICAgTWVtb3JpZXNbImFhcGx5YS1hdGh2YW5pPGJyLz5HbyJdCiAgICAgICAgREJbKCJOZW80aiIpXQogICAgICAgIE9CSlsoIk1pbklPIildCiAgICBlbmQKICAgIGlPUyAmIFdlYiA8LS0-IExMTQogICAgTExNIC0tPiBEQgogICAgRmluYW5jZSAtLT4gREIKICAgIE1lbW9yaWVzIC0tPiBEQgogICAgTWVtb3JpZXMgLS0-IE9CSgogICAgRmluYW5jZSAtLi0-fENEUiBBUEl8IEJhbmtbIkJhbmsgQVBJcyJd" alt="diagram" width="786" height="581"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;aapla-dhan&lt;/strong&gt; (आपलं धन)&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;Finance — syncs bank transactions, loans, investments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;aaplya-athvani&lt;/strong&gt; (आपल्या आठवणी)&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;Memories — photo sync, tagging, storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;aapla-hushar&lt;/strong&gt; (आपला हुशार)&lt;/td&gt;
&lt;td&gt;Python + LangGraph&lt;/td&gt;
&lt;td&gt;AI chat — intent routing, agents, Ollama sidecar&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;aapla-mahitisatha&lt;/strong&gt; (आपला माहिती साठा)&lt;/td&gt;
&lt;td&gt;Neo4j&lt;/td&gt;
&lt;td&gt;Graph database for all structured data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;gharasathi-ios&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Swift/SwiftUI&lt;/td&gt;
&lt;td&gt;Native iOS app&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;gharasathi-web&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;React/TypeScript&lt;/td&gt;
&lt;td&gt;Browser interface&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Everything runs on a single &lt;a href="https://www.amazon.com.au/Windows-Computer-Supporting-Ethernet-BYTENUC/dp/B09GK2QSY1" rel="noopener noreferrer"&gt;ByteNUC mini PC&lt;/a&gt; — 6GB RAM, 2TB disk, &lt;strong&gt;no GPU&lt;/strong&gt; — running Talos Linux. The entire stack — OS, K8s, database, LLM, and all services — fits in that 6GB with CPU-only inference.&lt;/p&gt;

&lt;p&gt;A core design principle: services &lt;em&gt;write&lt;/em&gt; data to Neo4j, and the LLM &lt;em&gt;reads&lt;/em&gt; from it. The AI layer never modifies your data — it only queries and explains. This separation means the LLM can be swapped, restarted, or upgraded without any risk to your actual records.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Neo4j?
&lt;/h2&gt;

&lt;p&gt;Household data is inherently a graph. People own accounts. Accounts generate transactions. Transactions happen at places. Photos feature people at events. Events are held at places. Places contain other places.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZ3JhcGggTFIKICAgIFBbUGVyc29uXSAtLT58T1dOU3wgQVtBY2NvdW50XQogICAgQSAtLT58U09VUkNFX09GfCBUW1RyYW5zYWN0aW9uXQogICAgVCAtLT58T0NDVVJSRURfQVR8IFBMW1BsYWNlXQogICAgVCAtLT58UkVMQVRFRF9UT3wgRVtFdmVudF0KICAgIFBIW1Bob3RvXSAtLT58RkVBVFVSRVN8IFAKICAgIFBIIC0tPnxUQUtFTl9BVHwgUEwKICAgIFBIIC0tPnxQQVJUX09GfCBFCiAgICBQIC0tPnxBVFRFTkRFRHwgRQogICAgRSAtLT58SEVMRF9BVHwgUEwKICAgIFBMIC0tPnxMT0NBVEVEX0lOfCBQTDJbUmVnaW9uXQ%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZ3JhcGggTFIKICAgIFBbUGVyc29uXSAtLT58T1dOU3wgQVtBY2NvdW50XQogICAgQSAtLT58U09VUkNFX09GfCBUW1RyYW5zYWN0aW9uXQogICAgVCAtLT58T0NDVVJSRURfQVR8IFBMW1BsYWNlXQogICAgVCAtLT58UkVMQVRFRF9UT3wgRVtFdmVudF0KICAgIFBIW1Bob3RvXSAtLT58RkVBVFVSRVN8IFAKICAgIFBIIC0tPnxUQUtFTl9BVHwgUEwKICAgIFBIIC0tPnxQQVJUX09GfCBFCiAgICBQIC0tPnxBVFRFTkRFRHwgRQogICAgRSAtLT58SEVMRF9BVHwgUEwKICAgIFBMIC0tPnxMT0NBVEVEX0lOfCBQTDJbUmVnaW9uXQ%3D%3D" alt="diagram" width="1628" height="218"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The schema has &lt;strong&gt;6 node types&lt;/strong&gt; and &lt;strong&gt;17 relationship types&lt;/strong&gt; — from &lt;code&gt;OWNS&lt;/code&gt; and &lt;code&gt;SOURCE_OF&lt;/code&gt; to &lt;code&gt;FEATURES&lt;/code&gt;, &lt;code&gt;SIMILAR_TO&lt;/code&gt;, and &lt;code&gt;LOCATED_IN&lt;/code&gt;. This density of connections is exactly what graph databases excel at.&lt;/p&gt;

&lt;p&gt;Consider what "our Sydney trip" means in data terms. It's an Event node connected to: Transaction nodes (what we spent), Photo nodes (what we captured), Person nodes (who went), and Place nodes (where we went) — which themselves link upward via &lt;code&gt;LOCATED_IN&lt;/code&gt; to "Sydney" to "NSW" to "Australia." In a relational database, that's a normalized nightmare. In a graph, it's just… the shape of the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Query That Sells It
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"How much did we spend during the Christmas trip?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In a relational database, this requires JOINing across 4+ tables: events, transactions, places, and a trip-transaction mapping table. In Neo4j, it's a single traversal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;e:&lt;/span&gt;&lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;e.title&lt;/span&gt; &lt;span class="ow"&gt;CONTAINS&lt;/span&gt; &lt;span class="s1"&gt;'Christmas'&lt;/span&gt; &lt;span class="ow"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;e.startDate.year&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2025&lt;/span&gt;
&lt;span class="k"&gt;OPTIONAL&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:PART_OF&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;photo:&lt;/span&gt;&lt;span class="n"&gt;Photo&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;OPTIONAL&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:RELATED_TO&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;t:&lt;/span&gt;&lt;span class="n"&gt;Transaction&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;OPTIONAL&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:ATTENDED&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;person:&lt;/span&gt;&lt;span class="n"&gt;Person&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;e.title&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;
  &lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;person.name&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;attendees&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;
  &lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;photo&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;photos&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;
  &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t.amount&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;totalSpent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One query. It starts at the Christmas event node, walks outward along relationships, and gathers everything connected: who attended, how many photos were taken, and the total cost. No JOINs. No subqueries. The relationships &lt;em&gt;are&lt;/em&gt; the schema.&lt;/p&gt;

&lt;p&gt;This pattern repeats across every use case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"What are my top spending categories?"&lt;/strong&gt; — aggregate Transaction nodes by category&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Show photos from the reef trip"&lt;/strong&gt; — traverse &lt;code&gt;Photo → PART_OF → Event&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Find photos similar to this one"&lt;/strong&gt; — vector similarity search on embeddings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"What did we do last year?"&lt;/strong&gt; — walk Events by date, gather connected Transactions and Photos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Show spending by location"&lt;/strong&gt; — traverse &lt;code&gt;Transaction → OCCURRED_AT → Place&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The query file has 25+ patterns covering financial analysis, photo search, people lookup, event timelines, and cross-domain insights. Every one of them follows the same shape: start at a node, walk relationships, aggregate what you find.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vector Search Built In
&lt;/h3&gt;

&lt;p&gt;Neo4j 5.11+ supports native vector indexes. Every Photo, Transaction, and Event node carries a 1536-dimension embedding vector. This enables semantic search without a separate vector database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;db.index.vector.queryNodes&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'photo_embedding'&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p.embedding&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;node.filename&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node.aiDescription&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"Find sunset photos" doesn't need exact keyword matching — it searches by meaning. No separate Pinecone or Weaviate instance needed — one database handles both structured queries and semantic search.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dual Storage: Graph + Object
&lt;/h3&gt;

&lt;p&gt;One thing Neo4j &lt;em&gt;shouldn't&lt;/em&gt; store is binary files. Photos and videos go to MinIO — a self-hosted, S3-compatible object store. Neo4j holds the metadata (who's in the photo, where it was taken, AI-generated description, embedding vector) while MinIO holds the actual JPEG. The Photo node's &lt;code&gt;storagePath&lt;/code&gt; property links the two.&lt;/p&gt;

&lt;p&gt;This keeps Neo4j lean — critical when you're running it in 1GB of RAM — while MinIO happily stores terabytes of photos on the 2TB disk.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The architecture makes sense on paper. But how do you run Neo4j + an LLM + 4 microservices on a machine with &lt;strong&gt;only 6GB RAM&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;Part 2&lt;/strong&gt;, I cover the model selection journey — where research recommended one model, reality disagreed, and I had to learn the hard way that benchmarks don't mean much on constrained hardware.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is Part 1 of a 3-part series on building a privacy-first household AI. Use the series navigation above to read Part 2 (LLM Model Selection) and Part 3 (Privacy &amp;amp; Lessons from OpenClaw).&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>selfhosted</category>
      <category>neo4j</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Building Apps with AI: Deep Dive into beads Workflow</title>
      <dc:creator>Koustubh</dc:creator>
      <pubDate>Tue, 20 Jan 2026 22:25:56 +0000</pubDate>
      <link>https://forem.com/koustubh/building-apps-with-ai-deep-dive-into-beads-workflow-28h1</link>
      <guid>https://forem.com/koustubh/building-apps-with-ai-deep-dive-into-beads-workflow-28h1</guid>
      <description>&lt;h1&gt;
  
  
  Building Apps with AI: Deep Dive into beads Workflow
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Part 2 of 2: JSONL Memory, Real Examples, and Honest Drawbacks&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Recap
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="//./part1-introduction-to-beads.md"&gt;Part 1&lt;/a&gt;, I introduced &lt;code&gt;beads&lt;/code&gt; — a git-native issue tracker designed for AI-assisted development. We looked at the Mission House app and the basic workflow. Now let’s go a bit deeper.&lt;/p&gt;




&lt;h2&gt;
  
  
  Scope and Assumptions
&lt;/h2&gt;

&lt;p&gt;This post reflects a &lt;strong&gt;solo, AI-assisted development workflow&lt;/strong&gt; on a small but non-trivial codebase (dozens of tasks, explicit dependencies, multiple external APIs).&lt;/p&gt;

&lt;p&gt;Assumptions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The AI agent has read access to the full issue graph&lt;/li&gt;
&lt;li&gt;Execution efficiency matters more than prolonged design deliberation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For ambiguous product discovery, multi-team coordination, or regulated environments, spec-driven approaches may be a better &lt;em&gt;first&lt;/em&gt; step.&lt;/p&gt;




&lt;h2&gt;
  
  
  The JSONL Advantage: Compact and Queryable
&lt;/h2&gt;

&lt;p&gt;Every beads issue is stored as a single line of JSON in &lt;code&gt;.beads/issues.jsonl&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mission-house-ogp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Implement myschool.edu.au scraper"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Create Puppeteer-based scraper..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"closed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"close_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"NAPLAN scraper implemented in server.js"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"depends_on_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mission-house-5mv"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compare this to a typical markdown task file that might span dozens of lines with headers, descriptions, and nested checklists for the same information.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Compact Matters
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbtlitj4buzjvu4u10ebq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbtlitj4buzjvu4u10ebq.png" alt="Compact vs verbose comparison" width="328" height="1356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The AI gets structured data it can query, not prose it must interpret:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;bd ready&lt;/code&gt;&lt;/strong&gt; - What's unblocked and highest priority?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;bd blocked&lt;/code&gt;&lt;/strong&gt; - What's waiting on other work?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;bd show &amp;lt;id&amp;gt;&lt;/code&gt;&lt;/strong&gt; - Full details on one issue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;bd stats&lt;/code&gt;&lt;/strong&gt; - Project health at a glance&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Close Reasons: Implementation Memory
&lt;/h3&gt;

&lt;p&gt;When you close an issue, you document what was actually built:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bd close mission-house-ogp &lt;span class="nt"&gt;--reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"NAPLAN scraper implemented in server.js, handles terms acceptance and score extraction"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not just status — it’s ground truth.&lt;/p&gt;

&lt;p&gt;Specs capture intent.&lt;br&gt;
Close reasons capture reality.&lt;/p&gt;
&lt;h3&gt;
  
  
  Real Example: Session Continuity
&lt;/h3&gt;

&lt;p&gt;Here's what happened when I resumed work on NAPLAN scoring after a break:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session 1 (ended with):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bd close mission-house-ogp &lt;span class="nt"&gt;--reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"NAPLAN scores integration complete: scraper implemented in server.js"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Session 2 (started with):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; bd ready

mission-house-6t1 &lt;span class="o"&gt;[&lt;/span&gt;P2] &lt;span class="o"&gt;[&lt;/span&gt;task] open - Display NAPLAN scores &lt;span class="k"&gt;in &lt;/span&gt;UI
  └─ Blocked by: mission-house-ogp &lt;span class="o"&gt;(&lt;/span&gt;closed&lt;span class="o"&gt;)&lt;/span&gt;, mission-house-0ch &lt;span class="o"&gt;(&lt;/span&gt;closed&lt;span class="o"&gt;)&lt;/span&gt;
  └─ All blockers resolved - ready to work!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude immediately knew:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The scraper was done (from ogp's close reason)&lt;/li&gt;
&lt;li&gt;The schema was updated (from 0ch's close reason)&lt;/li&gt;
&lt;li&gt;The next logical step was UI display&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;No manual context re-establishment was needed&lt;/strong&gt;, because dependencies and implementation details were already encoded.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hierarchy: Epics → Features → Tasks
&lt;/h2&gt;

&lt;p&gt;We organized Mission House using a three-level hierarchy:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyx8y1flixp9hmrho9yyt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyx8y1flixp9hmrho9yyt.png" alt="Epics Features Tasks hierarchy" width="784" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Structure Works
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Typical Count&lt;/th&gt;
&lt;th&gt;Lifetime&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Epic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strategic goal, multiple sessions&lt;/td&gt;
&lt;td&gt;2-5 per project&lt;/td&gt;
&lt;td&gt;Weeks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Feature&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;User-facing capability&lt;/td&gt;
&lt;td&gt;5-15 per epic&lt;/td&gt;
&lt;td&gt;Days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Task&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single implementation unit&lt;/td&gt;
&lt;td&gt;3-10 per feature&lt;/td&gt;
&lt;td&gt;Hours&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The AI works at the &lt;strong&gt;task level&lt;/strong&gt; but understands the &lt;strong&gt;feature and epic context&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real Issues from Mission House
&lt;/h2&gt;

&lt;p&gt;Let me show you actual issues from our project to illustrate different patterns:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 1: Task with Clear Dependencies
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mission-house-73p"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Calculate Flinders Street Station travel time"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Calculate travel time to Flinders Street Station during peak hours on a working day from: (a) the nearest train station, (b) the property address directly."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"closed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"issue_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"close_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Implemented MapsService.getTravelToFlinders() with peak hour scheduling. Calculates transit, driving, walking routes from property and via nearest station"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"depends_on_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mission-house-utk"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"blocks"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What the AI learned from this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can't calculate commute until "Find nearest train station" (utk) is done&lt;/li&gt;
&lt;li&gt;Implementation went into &lt;code&gt;MapsService.getTravelToFlinders()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Peak hour scheduling was added&lt;/li&gt;
&lt;li&gt;Multiple route types were implemented&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pattern 2: Bug with Acceptance Criteria
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mission-house-v4e"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fix naplan score web scraping logic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Naplan score web scraping not working as expected. Check the requirements document"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"acceptance_criteria"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Naplan score written in json file as in the requirements document"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"closed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"issue_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bug"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"close_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Implemented naplan_quality metric: added benchmark constants, quality calculation function, and UI display in both hub-spoke view and compare page radar chart"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Priority 0 (P0)&lt;/strong&gt; is the highest priority level. The AI knew to work on this first.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3: Tombstone (Deleted Issue)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mission-house-ck6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Implement spider/radar chart visualization"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tombstone"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"deleted_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-17T23:15:21.909135+11:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"deleted_by"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"batch delete"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"delete_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"batch delete"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"original_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"task"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tombstones preserve history while removing clutter. The AI knows this was deleted and won't try to work on it.&lt;/p&gt;




&lt;h2&gt;
  
  
  beads vs. agent-os (an SDD Framework)
&lt;/h2&gt;

&lt;p&gt;Spec-Driven Development (SDD) is a methodology - different tools implement it differently. Let's compare beads to &lt;a href="https://buildermethods.com/agent-os/workflow" rel="noopener noreferrer"&gt;agent-os&lt;/a&gt;, one popular SDD framework.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: This comparison is specific to agent-os. Other SDD implementations may work differently.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Two Different Philosophies
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;agent-os&lt;/strong&gt; follows a six-phase workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Plan Product → 2. Shape Spec → 3. Write Spec → 4. Create Tasks → 5. Implement → 6. Orchestrate&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It uses layered context (Standards/Product/Specs) in markdown files. Tasks are &lt;em&gt;derived from specs&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;beads&lt;/strong&gt; is task-first:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create issues → 2. Add dependencies → 3. Run &lt;code&gt;bd ready&lt;/code&gt; → 4. Implement&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No ongoing spec phase was required. A lightweight requirements document seeded the task graph, after which dependencies were tracked as explicit graph edges rather than implied through prose.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0ry92oa6txf79e6hbxm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0ry92oa6txf79e6hbxm.png" alt="agent-os vs beads comparison" width="784" height="53"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Comparison Table (agent-os vs beads)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;agent-os&lt;/th&gt;
&lt;th&gt;beads&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Philosophy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Spec-first&lt;/td&gt;
&lt;td&gt;Task-first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Persistence&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ MD files in git&lt;/td&gt;
&lt;td&gt;✅ JSONL in git&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context layers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Standards/Product/Specs&lt;/td&gt;
&lt;td&gt;Flat issue list&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Task creation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Derived from specs&lt;/td&gt;
&lt;td&gt;Created directly (or through a file input)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependencies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Implicit in spec narrative&lt;/td&gt;
&lt;td&gt;Explicit graph edges&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;"What's next?"&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Derived from spec phase&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;bd ready&lt;/code&gt; computes it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Upfront design&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Required (spec phases)&lt;/td&gt;
&lt;td&gt;Optional&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Complex features needing design&lt;/td&gt;
&lt;td&gt;Iterative, fast-moving work&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  When to Use Which
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Use agent-os (or similar SDD frameworks) when:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Complex features&lt;/strong&gt; - You need to think through architecture before coding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team alignment&lt;/strong&gt; - Specs help communicate intent to other humans&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stakeholder buy-in&lt;/strong&gt; - Non-technical people need to review plans&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulated industries&lt;/strong&gt; - Formal specs may be required&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Use beads when:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fast iteration&lt;/strong&gt; - You want to jump straight to tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clear requirements&lt;/strong&gt; - You already know what to build&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependency-heavy work&lt;/strong&gt; - Many tasks blocking each other&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solo or AI-assisted&lt;/strong&gt; - Less need for human-readable specs&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Can You Use Both?
&lt;/h3&gt;

&lt;p&gt;Yes. You could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use SDD's planning phases to think through architecture&lt;/li&gt;
&lt;li&gt;Export tasks to beads for execution with graph-based tracking&lt;/li&gt;
&lt;li&gt;Keep high-level context in a README, detailed execution in beads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;My approach:&lt;/strong&gt; For Mission House, I skipped formal specs and went straight to beads. The requirements doc was enough context - I didn't need a full SDD workflow for a personal project.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Drawbacks: What Didn't Work
&lt;/h2&gt;

&lt;p&gt;Let me be honest about the challenges:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Learning Curve
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6zwlv5p1jtk781lg6ubg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6zwlv5p1jtk781lg6ubg.png" alt="Learning curve" width="784" height="43"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The CLI commands take time to internalize. &lt;code&gt;bd dep add A B&lt;/code&gt; means "A depends on B" (B blocks A) - I got this backwards several times.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Sync Conflicts
&lt;/h3&gt;

&lt;p&gt;When working across multiple branches or machines, sync conflicts can occur:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; bd list
💡 Tip: Run &lt;span class="s1"&gt;'bd sync'&lt;/span&gt; to resolve &lt;span class="nb"&gt;sync &lt;/span&gt;conflict
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix is usually simple (&lt;code&gt;bd sync --from-main&lt;/code&gt;), but it's an extra step that spec documents don't have.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Over-Granularity Temptation
&lt;/h3&gt;

&lt;p&gt;It's tempting to create a task for everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Too granular - don't do this&lt;/span&gt;
bd create &lt;span class="nt"&gt;--title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Add import statement for React"&lt;/span&gt;
bd create &lt;span class="nt"&gt;--title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Create empty component file"&lt;/span&gt;
bd create &lt;span class="nt"&gt;--title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Add basic JSX structure"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Better:&lt;/strong&gt; One task for "Create React component for X with basic structure"&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Daemon Startup Delays
&lt;/h3&gt;

&lt;p&gt;Occasionally the beads daemon takes time to start:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; bd list
Warning: Daemon took too long to start &lt;span class="o"&gt;(&amp;gt;&lt;/span&gt;5s&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; Running &lt;span class="k"&gt;in &lt;/span&gt;direct mode.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not a blocker, but noticeable.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. No Visual Dashboard
&lt;/h3&gt;

&lt;p&gt;This is not entirely true. You have a lot of community built dashboards available &lt;a href="https://github.com/steveyegge/beads/blob/main/docs/COMMUNITY_TOOLS.md" rel="noopener noreferrer"&gt;here&lt;/a&gt; which will make your life much easier.&lt;/p&gt;




&lt;h2&gt;
  
  
  Advanced Features We Used
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bulk Operations
&lt;/h3&gt;

&lt;p&gt;When we had duplicate issues, we cleaned up with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bd delete mission-house-ck6 mission-house-d1s mission-house-hqs &lt;span class="nt"&gt;--reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"batch delete"&lt;/span&gt; &lt;span class="nt"&gt;--force&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This created tombstones preserving the history.&lt;/p&gt;

&lt;h3&gt;
  
  
  Priority System
&lt;/h3&gt;

&lt;p&gt;beads uses P0-P4 priorities:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Priority&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Our Usage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;P0&lt;/td&gt;
&lt;td&gt;Critical&lt;/td&gt;
&lt;td&gt;Blocking bugs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P1&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Core features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P2&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Most tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P3&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Nice-to-haves&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P4&lt;/td&gt;
&lt;td&gt;Backlog&lt;/td&gt;
&lt;td&gt;Future ideas&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Close Reasons
&lt;/h3&gt;

&lt;p&gt;Always close with a reason:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bd close mission-house-qvs &lt;span class="nt"&gt;--reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Added Google Maps API with Places and Geometry libraries. MapsService provides geocoding, directions, nearest station search, and autocomplete"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This becomes searchable context for future sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Integrations
&lt;/h3&gt;

&lt;p&gt;beads also supports &lt;a href="https://github.com/steveyegge/beads/discussions/430" rel="noopener noreferrer"&gt;syncing with Jira&lt;/a&gt; if your team needs to keep stakeholders updated in traditional issue trackers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Timeline Visualization
&lt;/h2&gt;

&lt;p&gt;Here's how our project actually progressed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F15u5xewy9t4809ekwq1a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F15u5xewy9t4809ekwq1a.png" alt="Project timeline" width="784" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Total active development time:&lt;/strong&gt; ~3 hours across 2 days&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;beads and SDD frameworks like agent-os represent different philosophies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SDD frameworks&lt;/strong&gt; say: "Think first, spec it out, then derive tasks"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;beads&lt;/strong&gt; says: "Create tasks directly, let the graph handle prioritization"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Neither is universally better. SDD frameworks shine when you need upfront design and human-readable documentation. beads shines when you want to move fast with automatic dependency resolution.&lt;/p&gt;

&lt;p&gt;What makes beads unique:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Task-first workflow&lt;/strong&gt; - Skip straight to issue creation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph-based dependencies&lt;/strong&gt; - Explicit edges, not prose to interpret&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic prioritization&lt;/strong&gt; - &lt;code&gt;bd ready&lt;/code&gt; computes what's next via graph traversal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compact JSONL&lt;/strong&gt; - High signal-to-noise ratio as projects grow&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Mission House project went from idea to working app in about 3 hours of active development, spread across multiple sessions. The graph kept track of what was blocked, what was ready, and what was done - no spec documents required.&lt;/p&gt;

&lt;p&gt;Choose the approach that fits your project. Or use both.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/steveyegge/beads" rel="noopener noreferrer"&gt;beads GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://koustubh25.github.io/mission-house/" rel="noopener noreferrer"&gt;Mission House Demo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/koustubh25/mission-house" rel="noopener noreferrer"&gt;Mission House Source&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://claude.ai" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Thanks for reading! If you try beads on your next AI-assisted project, I'd love to hear how it goes.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>projectmanagement</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Building Apps with AI: How beads Changed My Development Workflow</title>
      <dc:creator>Koustubh</dc:creator>
      <pubDate>Tue, 20 Jan 2026 22:25:54 +0000</pubDate>
      <link>https://forem.com/koustubh/building-apps-with-ai-how-beads-changed-my-development-workflow-2p7</link>
      <guid>https://forem.com/koustubh/building-apps-with-ai-how-beads-changed-my-development-workflow-2p7</guid>
      <description>&lt;h1&gt;
  
  
  Building Apps with AI: How &lt;code&gt;beads&lt;/code&gt; Changed My Development Workflow
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Part 1 of 2: From Spec Documents to Living Issue Trackers&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I built a real estate comparison app using Claude Code and a tool called &lt;a href="https://github.com/steveyegge/beads" rel="noopener noreferrer"&gt;beads&lt;/a&gt; - a git-native issue tracker designed for AI-assisted development. This post explores how beads transformed my workflow from writing lengthy spec documents to having a living, breathing project tracker that my AI assistant actually understands.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; &lt;a href="https://github.com/koustubh25/mission-house" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Live Demo:&lt;/strong&gt; &lt;a href="https://koustubh25.github.io/mission-house/" rel="noopener noreferrer"&gt;Mission House App&lt;/a&gt; - This is a SPA with no backend, so it requires a Google Maps Platform API key to be provided on the UI to render fully. Have a look at the snapshots in the source code repository above to get an idea of the app.&lt;/p&gt;


&lt;h2&gt;
  
  
  Spec-Driven Development (SDD)
&lt;/h2&gt;

&lt;p&gt;If you've used AI coding assistants for larger projects, you've likely encountered Spec-Driven Development (SDD) - a methodology where detailed specifications drive the implementation process. I've written about &lt;a href="https://dev.to/koustubh/part-1-spec-driven-development-building-predictable-ai-assisted-software-19ne"&gt;SDD in detail previously&lt;/a&gt; - it's a powerful approach that works well for many projects. Different frameworks implement SDD differently.&lt;/p&gt;

&lt;p&gt;One popular implementation is &lt;a href="https://buildermethods.com/agent-os/workflow" rel="noopener noreferrer"&gt;agent-os&lt;/a&gt;, which formalizes SDD into a six-phase workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7c88qdgj24h8hicnvs7z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7c88qdgj24h8hicnvs7z.png" alt="agent-os SDD workflow" width="784" height="49"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;agent-os uses layered context (Standards → Product → Specs) stored in markdown files like &lt;code&gt;mission.md&lt;/code&gt;, &lt;code&gt;roadmap.md&lt;/code&gt;, and &lt;code&gt;tech-stack.md&lt;/code&gt;. Tasks are &lt;em&gt;derived from specs&lt;/em&gt;, not created directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;agent-os works well, but has trade-offs:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Spec-first philosophy&lt;/strong&gt; - You write specs before creating tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tasks derived from prose&lt;/strong&gt; - The AI interprets specs to generate tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layered context&lt;/strong&gt; - Rich documentation, but more files to maintain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sequential phases&lt;/strong&gt; - Structured workflow from planning to orchestration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8t4ogx01ded88kpu149.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8t4ogx01ded88kpu149.png" alt="SDD interpretation flow" width="784" height="54"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;SDD frameworks like agent-os are great for complex features that need upfront design. But what if you want to skip straight to task management?&lt;/p&gt;

&lt;p&gt;The key difference is where &lt;em&gt;control flow&lt;/em&gt; lives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In SDD, control flow is implicit in prose.&lt;/li&gt;
&lt;li&gt;In beads, control flow is explicit in a DAG.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To be clear: SDD shines when architectural intent must be stabilized early — beads optimizes for execution once intent is roughly understood.&lt;/p&gt;


&lt;h2&gt;
  
  
  Enter &lt;code&gt;beads&lt;/code&gt;: Task-First with Graph-Based Dependencies
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/steveyegge/beads" rel="noopener noreferrer"&gt;Beads&lt;/a&gt; is what its creator Steve Yegge calls "a drop-in cognitive upgrade for your coding agents." Instead of the spec-first approach, beads is &lt;strong&gt;task-first&lt;/strong&gt; - you create issues directly, with explicit dependencies stored as graph edges.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Compact JSONL, Not Verbose Markdown
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;.beads/
├── issues.jsonl    &lt;span class="c"&gt;# All issues in ONE compact file&lt;/span&gt;
├── config.yaml     &lt;span class="c"&gt;# Project configuration&lt;/span&gt;
└── db.sqlite       &lt;span class="c"&gt;# Local cache for fast queries&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;One line per issue. No walls of prose. The AI can quickly parse the entire project state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mission-house-ogp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Implement scraper"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"closed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"depends_on_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mission-house-5mv"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Automatic "What's Next?" via Graph Traversal
&lt;/h3&gt;

&lt;p&gt;This is the killer feature. Instead of the AI parsing prose to figure out order of execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bd ready  &lt;span class="c"&gt;# Shows only unblocked, high-priority tasks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The graph database computes eligibility automatically based on explicit status, dependencies, and user-defined priority.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftx3lrexsjz4cx7nz46wv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftx3lrexsjz4cx7nz46wv.png" alt="bd ready flow" width="784" height="43"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Explicit Dependencies = Enforced Execution Order
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bd dep add mission-house-abc mission-house-xyz
&lt;span class="c"&gt;# "abc depends on xyz" (xyz blocks abc)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't prose that might be ignored - it's a graph edge. The AI literally cannot see &lt;code&gt;abc&lt;/code&gt; in &lt;code&gt;bd ready&lt;/code&gt; until &lt;code&gt;xyz&lt;/code&gt; is closed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start: Installing beads
&lt;/h2&gt;

&lt;p&gt;For macOS users:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew tap steveyegge/beads
brew &lt;span class="nb"&gt;install &lt;/span&gt;bd

&lt;span class="c"&gt;# Initialize in your project&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
bd init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Other options: &lt;code&gt;npm install -g @beads/bd&lt;/code&gt; or &lt;code&gt;go install github.com/steveyegge/beads/cmd/bd@latest&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That's it. You now have a &lt;code&gt;.beads&lt;/code&gt; directory in your repo.&lt;/p&gt;

&lt;p&gt;Also install the beads claude code plugin following &lt;a href="https://github.com/steveyegge/beads/blob/main/docs/PLUGIN.md#install-plugin" rel="noopener noreferrer"&gt;this&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The App: Mission House
&lt;/h2&gt;

&lt;p&gt;Before diving deeper into beads, let me briefly introduce what we built. &lt;strong&gt;Mission House&lt;/strong&gt; is a property comparison tool for Melbourne house hunters. It helps answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which catchment schools serve this address?&lt;/li&gt;
&lt;li&gt;How do their NAPLAN scores compare to each other?&lt;/li&gt;
&lt;li&gt;How long is the commute to the CBD?&lt;/li&gt;
&lt;li&gt;How do these 4 properties compare on a radar chart?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnsm31ddlwsc6ght1cs9e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnsm31ddlwsc6ght1cs9e.png" alt="Mission House architecture" width="784" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The interesting part isn't the app itself - it's &lt;strong&gt;how we built it using beads&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Used beads: A Real Example
&lt;/h2&gt;

&lt;p&gt;Although beads is task-first, I didn’t start from a blank slate.&lt;/p&gt;

&lt;p&gt;I began with a lightweight &lt;a href="https://github.com/koustubh25/mission-house/blob/main/docs/requirements.md" rel="noopener noreferrer"&gt;requirements.md&lt;/a&gt; that described:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;core user flows&lt;/li&gt;
&lt;li&gt;data sources (e.g. NAPLAN, Google Maps)&lt;/li&gt;
&lt;li&gt;output expectations (comparison metrics, charts)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I then asked Claude Code (with beads installed) to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read requirements.md&lt;/li&gt;
&lt;li&gt;Propose epics, features, and tasks&lt;/li&gt;
&lt;li&gt;Encode them directly into beads issues with explicit dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, requirements existed, but they were treated as input, not as a continuously consulted execution artifact.&lt;/p&gt;

&lt;p&gt;Once the task graph existed, beads became the primary source of truth.&lt;/p&gt;

&lt;p&gt;Here's where the magic happens. When I started my next Claude Code session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; bd ready

mission-house-5mv [P1] [task] open - Set up frontend project structure
  └─ No blockers - ready to work!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude immediately knew what to work on. No spec re-reading, no context reconstruction - just:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; bd update mission-house-5mv --status=in_progress
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we're coding.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Dependency Graph
&lt;/h2&gt;

&lt;p&gt;Here's what our project looked like after the initial planning:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fytub7b1xz1rrw86zol9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fytub7b1xz1rrw86zol9t.png" alt="Dependency graph" width="784" height="222"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every arrow represents a &lt;code&gt;bd dep add&lt;/code&gt; command. The AI knows it can't work on "Web Scraper" until "URL Input Form" is done.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Coming in Part 2
&lt;/h2&gt;

&lt;p&gt;In the next post, I'll dive deep into:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The JSONL Advantage&lt;/strong&gt; - How storing issues in plain text gives AI assistants "long memory"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow Patterns&lt;/strong&gt; - Epics → Features → Tasks hierarchy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real Issue Examples&lt;/strong&gt; - Actual JSON from our project&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;beads vs. Spec-Driven Development&lt;/strong&gt; - A detailed comparison&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Drawbacks&lt;/strong&gt; - What didn't work so well&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced Features&lt;/strong&gt; - Tombstones, sync branches, and multi-session workflows&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Task-first vs. spec-first&lt;/strong&gt; - beads lets you create issues directly; SDD requires specs first&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph-based dependencies&lt;/strong&gt; - Explicit edges, not prose to interpret&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;bd ready&lt;/code&gt; is the killer feature&lt;/strong&gt; - Automatic prioritization via graph traversal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enforced execution order&lt;/strong&gt; - Blocked tasks are invisible until unblocked&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Continue to &lt;a href="//./part2-beads-deep-dive.md"&gt;Part 2: Deep Dive into beads Workflow →&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>devtools</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Part 5: Building Station Station - Should You Use Spec-Driven Development?</title>
      <dc:creator>Koustubh</dc:creator>
      <pubDate>Tue, 04 Nov 2025 05:46:51 +0000</pubDate>
      <link>https://forem.com/koustubh/part-5-building-station-station-should-you-use-spec-driven-development-2i7b</link>
      <guid>https://forem.com/koustubh/part-5-building-station-station-should-you-use-spec-driven-development-2i7b</guid>
      <description>&lt;p&gt;We've covered a lot in this series. In Part 1, we introduced Spec-Driven Development. In Part 2, we explored the Station Station project—8 features solving a real hybrid work compliance problem. In Part 3, we walked through the agent-os workflow. In Part 4, we got honest about the challenges and limitations.&lt;/p&gt;

&lt;p&gt;Now for the decision: &lt;strong&gt;Should you use SDD for your next project?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This part gives you a decision framework based on real experience, not theory. No marketing fluff—just practical guidance on when SDD makes sense and when it's overkill.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Actually Built
&lt;/h2&gt;

&lt;p&gt;Let's recap what the SDD approach delivered for Station Station:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Project:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Personal web application for tracking office attendance via Melbourne Myki transit data&lt;/li&gt;
&lt;li&gt;8 features across 3 phases (Foundation, Data Layer, Integration &amp;amp; UI)&lt;/li&gt;
&lt;li&gt;Live and deployed at &lt;a href="https://koustubh25.github.io/station-station/" rel="noopener noreferrer"&gt;https://koustubh25.github.io/station-station/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Fully autonomous daily execution via GitHub Actions&lt;/li&gt;
&lt;li&gt;Zero hosting costs (GitHub Pages + GitHub Actions free tier)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Tech:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~6,000 lines of code (3,500 Python + 2,500 React)&lt;/li&gt;
&lt;li&gt;Python backend with Playwright for browser automation and Cloudflare bypass&lt;/li&gt;
&lt;li&gt;React frontend with Tailwind CSS v4, responsive mobile-first design&lt;/li&gt;
&lt;li&gt;Lighthouse scores 95+ across the board&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Timeline:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~4-5 hours on specs&lt;/li&gt;
&lt;li&gt;~2 days on implementation&lt;/li&gt;
&lt;li&gt;~4-6 hours debugging hard problems (Cloudflare, timezone, multi-layer integration)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~3 days for 8 features shipped and deployed&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Result:&lt;/strong&gt;&lt;br&gt;
I use this app regularly to check my attendance compliance. It solves my actual problem. All planned features are complete. I can (and did) add new features weeks later by creating new specs and following the same workflow.&lt;/p&gt;
&lt;h2&gt;
  
  
  When to Use Spec-Driven Development
&lt;/h2&gt;

&lt;p&gt;Based on the Station Station experience, here's when SDD is worth the upfront investment:&lt;/p&gt;
&lt;h3&gt;
  
  
  ✅ Use SDD When:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. You're building a complete project, not just prototyping&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to actually ship something and maintain it, SDD helps you finish. The structure prevents the "60% complete and abandoned" problem that plagues side projects.&lt;/p&gt;

&lt;p&gt;Station Station could have easily become another abandoned project. Authentication alone took 2 days to solve. Without the roadmap keeping me focused on the goal, I might have given up after Cloudflare kept blocking me.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. You'll be coming back to the code later&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If there's any gap between coding sessions (days, weeks, months), the documentation is invaluable. The spec tells you what you were building and why. The task list shows what's done and what's next.&lt;/p&gt;

&lt;p&gt;I added the &lt;code&gt;manualAttendanceDates&lt;/code&gt; feature a week after the initial deployment. The existing specs told me exactly how the system worked, where to add the new field, and what components would be affected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. You're working solo or on a small team&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SDD provides structure when you don't have teammates to keep you accountable. The roadmap prevents scope creep. The task breakdown prevents getting overwhelmed.&lt;/p&gt;

&lt;p&gt;For Station Station, I was the only developer. The workflow kept me organized and prevented me from jumping between random features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The project involves multiple components or layers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When your project has backend + frontend, or data extraction + processing + visualization, specs help you think through the integration points upfront.&lt;/p&gt;

&lt;p&gt;Station Station has Python backend, GitHub Actions automation, and React frontend. The specs documented how data flows between these layers, which made debugging multi-layer issues much easier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. You want to learn a structured development process&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're tired of chaotic development and want to build better habits, SDD provides a framework. The first project has a learning curve, but future projects benefit from the workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. You're solving a non-trivial problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Simple CRUD apps or one-off scripts don't benefit much from SDD. But if your problem has complexity (Cloudflare bypass, API reverse engineering, browser automation), the structure helps you tackle it systematically.&lt;/p&gt;
&lt;h3&gt;
  
  
  ❌ Skip SDD When:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. You're doing quick experiments or throwaway code&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're testing an idea and will likely discard the code, specs are overkill. Just write code and see if the idea works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. You have a crystal-clear mental model&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you've built this exact thing 10 times before and know every step, specs won't add much value. You already have the structure in your head.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The project is extremely simple&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A single-file script, a basic static site, or a trivial automation doesn't need specs. Just write it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. You're under extreme time pressure&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you need something working in the next 2 hours, don't spend 30 minutes on a spec. But recognize you're trading speed now for maintenance pain later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. You're learning a completely new technology&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're learning React for the first time, just following tutorials and experimenting might be better than trying to spec everything out. Learn first, then apply structure to real projects.&lt;/p&gt;
&lt;h2&gt;
  
  
  Decision Framework
&lt;/h2&gt;

&lt;p&gt;Here's a simple decision tree:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Is this a real project you want to finish and maintain?
├─ No → Skip SDD, just code
└─ Yes ↓

Will you be working on this over multiple sessions?
├─ No → Skip SDD unless project is complex
└─ Yes ↓

Does the project involve multiple components/layers?
├─ No, single component → SDD optional
└─ Yes, multiple layers ↓

Are you working solo or small team?
├─ No, large team with existing processes → Evaluate SDD fit
└─ Yes ↓

→ USE SDD. The upfront investment will pay off.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Honest Trade-offs
&lt;/h2&gt;

&lt;p&gt;Let's be real about what you're signing up for:&lt;/p&gt;

&lt;h3&gt;
  
  
  What You Give Up:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Time to first code:&lt;/strong&gt; Specs take time. You'll spend 30 minutes to several hours documenting before you write a single line of code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flexibility to "just try things":&lt;/strong&gt; SDD encourages thinking before coding. If you like to experiment your way to a solution, the structure might feel constraining.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simplicity:&lt;/strong&gt; You're adding agent-os to your workflow. There's a learning curve. The first project takes longer.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You Get:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Actually finishing projects:&lt;/strong&gt; Structure prevents abandonment. The roadmap keeps you focused. The task breakdown prevents overwhelm.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resumability:&lt;/strong&gt; Come back weeks later and know exactly where you left off. No re-learning your own codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Systematic debugging:&lt;/strong&gt; When things break, the spec tells you what should happen. The task breakdown shows you where to look.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Documented decisions:&lt;/strong&gt; Future you (or future contributors) can understand why things were built a certain way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Less decision fatigue:&lt;/strong&gt; The spec tells you what to build next. No "what should I work on today?" paralysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Bottom Line:
&lt;/h3&gt;

&lt;p&gt;SDD trades &lt;strong&gt;upfront time&lt;/strong&gt; for &lt;strong&gt;higher completion rate&lt;/strong&gt; and &lt;strong&gt;better maintainability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you care more about finishing than starting, SDD is worth it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Get Started
&lt;/h2&gt;

&lt;p&gt;If you've decided SDD might work for your next project, here's how to begin:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Try It With a Real Project
&lt;/h3&gt;

&lt;p&gt;Don't practice with a tutorial. Pick an actual problem you want to solve. Station Station worked because I genuinely needed to track my attendance.&lt;/p&gt;

&lt;p&gt;Your first SDD project will be slower. That's normal. The second project will be much faster once you internalize the workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Start With Agent-OS
&lt;/h3&gt;

&lt;p&gt;Agent-os is the tool I used for Station Station. It's built for Claude and provides the complete workflow: product creation, spec shaping, spec writing, task breakdown, and implementation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent-OS Repository:&lt;/strong&gt; &lt;a href="https://github.com/cyanheads/agent-os" rel="noopener noreferrer"&gt;https://github.com/cyanheads/agent-os&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Getting Started Guide:&lt;/strong&gt; Check the repository README for setup instructions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Try Station Station Yourself
&lt;/h3&gt;

&lt;p&gt;Station Station is open source and free to use. If you're a Melbourne train commuter with hybrid work requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live Application:&lt;/strong&gt; &lt;a href="https://koustubh25.github.io/station-station/" rel="noopener noreferrer"&gt;https://koustubh25.github.io/station-station/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Repository:&lt;/strong&gt; &lt;a href="https://github.com/koustubh25/station-station" rel="noopener noreferrer"&gt;https://github.com/koustubh25/station-station&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Two options to get started:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Get onboarded to the existing app&lt;/strong&gt; - See your attendance on the same GUI. Check the &lt;a href="https://github.com/koustubh25/station-station#option-1-get-onboarded-to-the-existing-app" rel="noopener noreferrer"&gt;README for onboarding instructions&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fork and deploy your own&lt;/strong&gt; - Complete control and privacy. The &lt;a href="https://github.com/koustubh25/station-station#option-2-fork-and-deploy-your-own" rel="noopener noreferrer"&gt;README has full deployment instructions&lt;/a&gt; using GitHub Actions and GitHub Pages (all free).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The repository also includes complete specs for all 8 features and task breakdowns, so you can study the SDD approach in action.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Other SDD Tools
&lt;/h3&gt;

&lt;p&gt;Agent-os isn't the only way to do Spec-Driven Development. Other tools exist like &lt;a href="https://github.com/Fission-AI/OpenSpec" rel="noopener noreferrer"&gt;OpenSpec&lt;/a&gt;, &lt;a href="https://github.com/github/spec-kit" rel="noopener noreferrer"&gt;Speckit&lt;/a&gt;, and others. However, I found some too simple (lacking the structure I needed) and others too verbose (overwhelming with process overhead). Agent-os struck a good balance for my workflow—structured enough to keep me organized, but not so heavy that it gets in the way of actually building.&lt;/p&gt;

&lt;p&gt;Your preferences might differ. If agent-os doesn't feel right, explore the alternatives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Spec-Driven Development isn't revolutionary. It's structure. It's documentation. It's thinking before coding.&lt;/p&gt;

&lt;p&gt;It won't make you code faster. It won't eliminate bugs. It won't replace your judgment.&lt;/p&gt;

&lt;p&gt;But it might help you &lt;strong&gt;finish&lt;/strong&gt; instead of abandon. It might help you &lt;strong&gt;resume&lt;/strong&gt; instead of restart. It might help you &lt;strong&gt;debug systematically&lt;/strong&gt; instead of randomly.&lt;/p&gt;

&lt;p&gt;For Station Station, that was enough. In 3 days, I went from "I need to track attendance" to a fully deployed application solving my real problem. Two weeks later, I added new features without re-learning the codebase. A month later, the app is still running autonomously, requiring zero maintenance.&lt;/p&gt;

&lt;p&gt;Your mileage may vary. Your projects are different. Your workflow preferences are different.&lt;/p&gt;

&lt;p&gt;But if you're tired of abandoned side projects, forgotten codebases, and chaotic development, maybe give SDD a try. Pick a real problem. Write a spec. Follow the workflow. See if it works for you.&lt;/p&gt;

&lt;p&gt;And if you do try it—or if you've been using SDD and have your own experiences to share—I'd love to hear about it. Drop a comment, open a GitHub discussion, or reach out.&lt;/p&gt;

&lt;p&gt;Thanks for reading this series. Now go build something.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Station Station Live App:&lt;/strong&gt; &lt;a href="https://koustubh25.github.io/station-station/" rel="noopener noreferrer"&gt;https://koustubh25.github.io/station-station/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Station Station GitHub:&lt;/strong&gt; &lt;a href="https://github.com/koustubh25/station-station" rel="noopener noreferrer"&gt;https://github.com/koustubh25/station-station&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent-OS GitHub:&lt;/strong&gt; &lt;a href="https://github.com/cyanheads/agent-os" rel="noopener noreferrer"&gt;https://github.com/cyanheads/agent-os&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenSpec GitHub:&lt;/strong&gt; &lt;a href="https://github.com/openspec-dev/openspec" rel="noopener noreferrer"&gt;https://github.com/openspec-dev/openspec&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>specdriven</category>
      <category>agentos</category>
      <category>productivity</category>
      <category>ai</category>
    </item>
    <item>
      <title>Part 4: Building Station Station - Where SDD Helped (and Where It Didn't)</title>
      <dc:creator>Koustubh</dc:creator>
      <pubDate>Tue, 04 Nov 2025 05:45:52 +0000</pubDate>
      <link>https://forem.com/koustubh/part-4-building-station-station-where-sdd-helped-and-where-it-didnt-3f7d</link>
      <guid>https://forem.com/koustubh/part-4-building-station-station-where-sdd-helped-and-where-it-didnt-3f7d</guid>
      <description>&lt;p&gt;In Parts 1-3, we covered Spec-Driven Development, the Station Station project, and the agent-os workflow. We saw a structured process that delivered 8 features, fully deployed and working. But I've been painting a rosy picture. Let me be honest about the challenges.&lt;/p&gt;

&lt;p&gt;This part is about the real development experience: Where did the structured SDD approach actually help? Where did I still struggle despite having specs and tasks? What problems can structure solve, and what problems require good old-fashioned debugging?&lt;/p&gt;

&lt;p&gt;If you're considering SDD for your next project, this is the part you need to read. Because understanding what structure can and can't solve is critical to setting realistic expectations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenge 1: Cloudflare Authentication Bypass
&lt;/h2&gt;

&lt;p&gt;Let me start with the most frustrating part of the entire project: getting authentication to work with the Myki portal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Context:&lt;/strong&gt;&lt;br&gt;
The whole project depends on accessing Myki transaction data. No authentication = no data = no project. This was the critical blocker. Everything else was blocked until this worked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt;&lt;br&gt;
The Myki portal is protected by Cloudflare Turnstile, which actively detects and blocks headless browsers. My first attempt using standard Playwright headless mode failed immediately with the "Verifying you are human..." overlay blocking form access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Spec:&lt;/strong&gt;&lt;br&gt;
"Use Playwright to authenticate with Myki portal and extract session tokens for API calls."&lt;/p&gt;

&lt;p&gt;Simple requirement, right? But the spec didn't capture the complexity of Cloudflare bot detection.&lt;/p&gt;
&lt;h3&gt;
  
  
  How SDD Helped
&lt;/h3&gt;

&lt;p&gt;The structured approach provided a framework for tackling this beast:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clear success criteria:&lt;/strong&gt; "Extract Bearer token from authentication response" - The spec told the AI (and me) exactly what success looked like, even if neither of us knew how to get there yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task breakdown kept me focused:&lt;/strong&gt; Instead of having the AI try to solve "make authentication work" all at once, the tasks broke it down into pieces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Launch browser with Playwright&lt;/li&gt;
&lt;li&gt;Navigate to login page&lt;/li&gt;
&lt;li&gt;Fill in credentials&lt;/li&gt;
&lt;li&gt;Submit form&lt;/li&gt;
&lt;li&gt;Extract authentication tokens&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When the AI's implementation of step 3 (fill in credentials) was blocked by Cloudflare, the task breakdown showed me exactly where the problem was.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Documentation of attempts:&lt;/strong&gt; Each failed approach got documented in the spec as "out of scope" or "doesn't work because..." This prevented me from asking the AI to retry the same failed approaches days later.&lt;/p&gt;
&lt;h3&gt;
  
  
  What SDD Couldn't Solve
&lt;/h3&gt;

&lt;p&gt;But here's the brutal truth: &lt;strong&gt;specs don't solve hard technical problems for you.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The AI tried implementing authentication multiple ways based on the spec. Each attempt failed. Over two days, I kept iterating:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 1: AI implements standard headless Playwright&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;playwright&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: Blocked by Cloudflare immediately. "Verifying you are human..."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 2: I ask AI to try headed mode (visible browser)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;playwright&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: Better, but still detected as automation. Random CAPTCHA challenges.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 3: I ask AI to try user-agent and header spoofing&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: Cloudflare is smarter than that. Still blocked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 4: AI tries stealth mode plugins&lt;/strong&gt;&lt;br&gt;
Result: Helped a bit, but not consistent. Sometimes worked, sometimes didn't.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Solution (After Two Days of Frustration)
&lt;/h3&gt;

&lt;p&gt;What finally worked: &lt;strong&gt;Browser profile trust signals&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After researching Cloudflare bypass techniques, I figured out the solution and told the AI to implement it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Create empty Chrome profile directory structure
&lt;/span&gt;&lt;span class="n"&gt;profile_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_empty_chrome_profile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# Launch with profile - appears as "real" browser to Cloudflare
&lt;/span&gt;&lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;playwright&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch_persistent_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_data_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;profile_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The profile directory contains just enough metadata (Cookies, Preferences, History files - all empty) to make Playwright look like a legitimate Chrome browser instead of automation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Was So Frustrating
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The spec couldn't help because:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This required deep knowledge of browser fingerprinting and bot detection&lt;/li&gt;
&lt;li&gt;Solutions aren't documented well (Cloudflare actively tries to prevent bypass)&lt;/li&gt;
&lt;li&gt;Trial and error was the only way to find what worked&lt;/li&gt;
&lt;li&gt;Each attempt took 5-10 minutes to test (AI implements → I run the code → see if blocked)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What I actually needed to do:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Research how Cloudflare Turnstile detects automation&lt;/li&gt;
&lt;li&gt;Learn that browser profiles affect fingerprinting&lt;/li&gt;
&lt;li&gt;Try approach after approach until something worked&lt;/li&gt;
&lt;li&gt;Debug headless browser issues by inspecting what Cloudflare was detecting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this came from the spec. The spec told the AI &lt;em&gt;what&lt;/em&gt; to achieve. But figuring out &lt;em&gt;how&lt;/em&gt; required me to research, experiment, find the solution, and then tell the AI to implement it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Takeaway
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What SDD provided:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear goal to work toward (extract Bearer token)&lt;/li&gt;
&lt;li&gt;Focus on one step at a time instead of being overwhelmed&lt;/li&gt;
&lt;li&gt;Documentation of failed approaches to avoid repetition&lt;/li&gt;
&lt;li&gt;Motivation to keep going (this was Task 1 on a roadmap of 8 features - couldn't give up)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What SDD couldn't provide:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Technical solution to Cloudflare bypass&lt;/li&gt;
&lt;li&gt;Knowledge of browser fingerprinting&lt;/li&gt;
&lt;li&gt;Shortcuts to avoid trial-and-error debugging&lt;/li&gt;
&lt;li&gt;The actual working approach (that required research and experimentation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Would specialized subagents have helped?&lt;/strong&gt;&lt;br&gt;
Honestly, I don't know—I didn't try them for this problem. Agent-os has advanced features like specialized research agents and orchestrated task execution that might have helped with researching Cloudflare bypass techniques. But I was using the standard workflow, so I can't say whether those advanced features would have shortened the two-day struggle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt;&lt;br&gt;
SDD gives you structure to tackle hard problems systematically. But &lt;strong&gt;hard problems are still hard&lt;/strong&gt;. Structure doesn't replace technical knowledge, research, and persistence. It just gives you a framework to keep trying without getting lost.&lt;/p&gt;

&lt;p&gt;When I finally got authentication working after two days, having it documented as "Task 1: COMPLETE ✓" with detailed notes on the working approach was incredibly valuable. Future features could reference "see Task 1 for Cloudflare bypass pattern." Without that documentation, I might have forgotten the solution by the time I needed to debug it again.&lt;/p&gt;
&lt;h2&gt;
  
  
  Challenge 2: The Multi-Layer Bug
&lt;/h2&gt;

&lt;p&gt;After finally getting authentication working, I ran into a different kind of frustration: the &lt;code&gt;manualAttendanceDates&lt;/code&gt; feature bug.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Context:&lt;/strong&gt;&lt;br&gt;
This requirement came much later, after the entire end-to-end system was already working. The app was successfully tracking attendance based on Myki transactions, deployed and running. But then I realized I needed a way to record office attendance on days when I drove to work instead of taking the train. No Myki transaction = no automatic detection. The solution was to add a &lt;code&gt;manualAttendanceDates&lt;/code&gt; config field where I could explicitly list dates I was in the office.&lt;/p&gt;

&lt;p&gt;This was an enhancement to an already-working system, not part of the original implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Spec:&lt;/strong&gt;&lt;br&gt;
Clear requirements. Well-defined tasks. Config schema documented. Expected behavior spelled out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Implementation:&lt;/strong&gt;&lt;br&gt;
Feature got implemented according to the tasks. Initial version deployed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt;&lt;br&gt;
Manual dates weren't showing up correctly. But it wasn't just one bug—it was multiple issues across different parts of the system.&lt;/p&gt;
&lt;h3&gt;
  
  
  How SDD Helped
&lt;/h3&gt;

&lt;p&gt;Having the spec gave me a debugging roadmap:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Check the spec requirements&lt;/strong&gt; - What should happen? "Manual dates should appear on the calendar with the same styling as PTV-detected dates"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Follow the task breakdown&lt;/strong&gt; - Config parsing → Python backend → GitHub Actions workflow → JSON output → Frontend rendering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trace the data flow&lt;/strong&gt; - The spec documented the exact data structure at each layer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I could systematically check each layer by reviewing the code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python code:&lt;/strong&gt; Reviewing the implementation, I could see it wasn't properly merging manual dates with PTV-detected dates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow file:&lt;/strong&gt; Looking at GitHub Actions, I realized it needed updates to handle the new field&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UI:&lt;/strong&gt; Checking the frontend code, I spotted where it needed changes to render manual dates correctly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This wasn't a single bug, but multiple integration issues across three different components.&lt;/p&gt;
&lt;h3&gt;
  
  
  What SDD Couldn't Solve
&lt;/h3&gt;

&lt;p&gt;Even with perfect specs, I still had to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Actually review the code&lt;/strong&gt; across all three layers to spot the issues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Give the AI specific hints&lt;/strong&gt; - "Look at how manual dates are merged in the Python code," "Check the workflow file," "The UI might not be checking the right field"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Understand the integration points&lt;/strong&gt; - recognizing that adding one field means touching Python backend, GitHub Actions workflow, and React frontend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connect the dots across components&lt;/strong&gt; - understanding how the config file flows through Python processing, gets written to JSON, picked up by GitHub Actions, and rendered by the UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The spec told me &lt;em&gt;what&lt;/em&gt; should happen. It didn't tell me that integrating a new field into an existing multi-layer system would require touching all these different pieces. The AI could implement each fix once I pointed it to the right location, but finding those locations required me to review the code and understand the full data flow.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Takeaway
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What SDD provided:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear expected behavior to test against&lt;/li&gt;
&lt;li&gt;Systematic way to isolate which layer was failing&lt;/li&gt;
&lt;li&gt;Documentation of the intended data structure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What still required human debugging:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understanding multi-layer integration issues&lt;/li&gt;
&lt;li&gt;Recognizing when different components had different assumptions&lt;/li&gt;
&lt;li&gt;Finding the exact line where the mismatch occurred&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Challenge 3: Timezone Handling
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Context:&lt;/strong&gt;&lt;br&gt;
Calendar dates were displaying incorrectly—off by 1 day from the actual values in &lt;code&gt;attendance.json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Spec:&lt;/strong&gt;&lt;br&gt;
"Display attended days on calendar matching the dates in the JSON file exactly."&lt;/p&gt;

&lt;p&gt;Simple, right? Match the dates. But there was a subtle problem.&lt;/p&gt;
&lt;h3&gt;
  
  
  How SDD Helped
&lt;/h3&gt;

&lt;p&gt;The spec was clear about &lt;em&gt;what&lt;/em&gt; should happen (dates must match), which made it obvious when they didn't. Without a spec, I might have thought "close enough" or missed the off-by-one bug entirely.&lt;/p&gt;

&lt;p&gt;The task breakdown also helped isolate the problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Task: "Parse attendance dates from JSON"&lt;/li&gt;
&lt;li&gt;Task: "Mark calendar tiles for attended dates"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bug was in the date parsing task, not the calendar rendering task. Task isolation made debugging faster.&lt;/p&gt;
&lt;h3&gt;
  
  
  What SDD Couldn't Solve
&lt;/h3&gt;

&lt;p&gt;The spec said "match the dates exactly" but didn't specify &lt;em&gt;how&lt;/em&gt; to handle timezones. The bug was subtle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Initial implementation - caused timezone conversion&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dateString&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;T&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="c1"&gt;// For dates near midnight, UTC conversion shifts the date!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem: JavaScript's &lt;code&gt;toISOString()&lt;/code&gt; converts to UTC. For dates near midnight, this can shift the date forward or backward. Nov 1, 2024 01:00 AEDT becomes Oct 31, 2024 14:00 UTC—wrong day!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why the spec didn't prevent this:&lt;/strong&gt;&lt;br&gt;
The spec didn't say "use local timezone, not UTC" because I didn't think about timezones when writing it. The requirement seemed obvious: match the dates. But "obvious" hides assumptions.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Corrected implementation - uses local timezone&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dateString&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLocaleDateString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;en-CA&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// YYYY-MM-DD format&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  The Takeaway
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What SDD provided:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear success criteria (dates must match exactly)&lt;/li&gt;
&lt;li&gt;Quick detection that something was wrong&lt;/li&gt;
&lt;li&gt;Task isolation to narrow down where the bug was&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What SDD didn't prevent:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Subtle implementation details (timezone handling)&lt;/li&gt;
&lt;li&gt;Hidden assumptions in "obvious" requirements&lt;/li&gt;
&lt;li&gt;Need for domain knowledge (how JavaScript handles dates)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt;&lt;br&gt;
Good specs need to surface non-obvious assumptions. "Match the dates" should have been "Match the dates using local timezone to avoid UTC conversion issues." But you often don't know to specify this until you've been bitten by the bug.&lt;/p&gt;
&lt;h2&gt;
  
  
  Challenge 4: Third-Party Library Integration
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Context:&lt;/strong&gt;&lt;br&gt;
Integrating the &lt;code&gt;date-holidays&lt;/code&gt; npm package to automatically detect Victoria public holidays.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Spec:&lt;/strong&gt;&lt;br&gt;
"Use date-holidays library to fetch Victoria public holidays. Display them on the calendar with red text."&lt;/p&gt;
&lt;h3&gt;
  
  
  How SDD Helped
&lt;/h3&gt;

&lt;p&gt;The spec documented exactly which library to use and what the expected behavior was. When the integration didn't work as expected, I could reference the spec to confirm what was supposed to happen.&lt;/p&gt;
&lt;h3&gt;
  
  
  What SDD Couldn't Solve
&lt;/h3&gt;

&lt;p&gt;The library returned dates in an unexpected format: &lt;code&gt;"YYYY-MM-DD HH:MM:SS"&lt;/code&gt; strings instead of JavaScript Date objects.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Initial attempt based on spec&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;holidayDate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;holiday&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// Assumed library returns Date objects - it doesn't!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why the spec didn't prevent this:&lt;/strong&gt;&lt;br&gt;
The spec said "use the library" but didn't document the exact return format because I hadn't investigated the library deeply when writing the spec. I assumed standard Date objects.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;I had to inspect the actual library output, recognize the format mismatch, and handle it explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Working solution after investigation&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dateString&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;holiday&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// "2025-11-04"&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;day&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;dateString&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;-&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;holidayDate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;month&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;day&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Uses local timezone&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Takeaway
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What SDD provided:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Documentation of which library to use (no decision paralysis)&lt;/li&gt;
&lt;li&gt;Clear requirement for what should be displayed (public holidays with red text)&lt;/li&gt;
&lt;li&gt;Task to test the integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What SDD didn't prevent:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runtime surprises from third-party libraries&lt;/li&gt;
&lt;li&gt;Need to investigate actual library behavior vs documented behavior&lt;/li&gt;
&lt;li&gt;Format mismatch that only appears when you run the code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt;&lt;br&gt;
Specs can't predict every third-party library quirk. You discover these by running code and inspecting actual output. The structured approach helps you document the quirks once you find them, so future tasks can reference the pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenge 5: User Preferences vs Developer Assumptions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Context:&lt;/strong&gt;&lt;br&gt;
During development, weekends were displaying in red text (standard react-calendar behavior). I assumed this was confusing since it wasn't attendance data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Plan:&lt;/strong&gt;&lt;br&gt;
Remove the red weekend styling via CSS override. Seemed like a clean UI improvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Actually Happened:&lt;/strong&gt;&lt;br&gt;
Before implementing the change, I asked the user (myself, wearing the user hat instead of developer hat). Response: "I really liked keeping weekends and public holidays in red."&lt;/p&gt;

&lt;p&gt;I almost removed a feature I valued because I was thinking like a developer, not a user.&lt;/p&gt;

&lt;h3&gt;
  
  
  How SDD Helped
&lt;/h3&gt;

&lt;p&gt;The structured workflow created natural checkpoints for user feedback:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After completing a task group, review with user&lt;/li&gt;
&lt;li&gt;Before removing functionality, validate with user&lt;/li&gt;
&lt;li&gt;Spec updates require user approval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this structure, I would have just removed the feature mid-coding session without stopping to think "Should I ask about this?"&lt;/p&gt;

&lt;h3&gt;
  
  
  What SDD Couldn't Solve
&lt;/h3&gt;

&lt;p&gt;SDD doesn't tell you what users want. It creates opportunities to ask, but you still have to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Actually ask the question&lt;/li&gt;
&lt;li&gt;Listen to the answer&lt;/li&gt;
&lt;li&gt;Override your own assumptions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Takeaway
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What SDD provided:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Natural review checkpoints to get user feedback&lt;/li&gt;
&lt;li&gt;Process that encourages "ask before removing"&lt;/li&gt;
&lt;li&gt;Documentation of decisions (why we kept the feature)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What SDD didn't prevent:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Making wrong assumptions in the first place&lt;/li&gt;
&lt;li&gt;Need for actual user communication&lt;/li&gt;
&lt;li&gt;Temptation to "just fix it" without asking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt;&lt;br&gt;
Structure creates opportunities for better decisions, but you still have to take advantage of those opportunities. The review checkpoint is useless if you skip it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where SDD Provided the Most Value
&lt;/h2&gt;

&lt;p&gt;After going through these challenges, here's where the structured approach actually helped:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Clear Success Criteria
&lt;/h3&gt;

&lt;p&gt;Every bug was obvious because the spec defined success. "Dates should match exactly" meant off-by-one was clearly wrong. Without specs, I might have rationalized it: "Close enough, probably a display thing."&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Systematic Debugging
&lt;/h3&gt;

&lt;p&gt;Task breakdown gave me a debugging roadmap. Instead of randomly checking files, I could trace the data flow through the task list:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Config parsing (Task 1)&lt;/li&gt;
&lt;li&gt;Backend processing (Task 2)&lt;/li&gt;
&lt;li&gt;JSON output (Task 3)&lt;/li&gt;
&lt;li&gt;Frontend rendering (Task 4)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Check each layer systematically until you find the broken one.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Documentation of Decisions
&lt;/h3&gt;

&lt;p&gt;When I came back a week later to add a new feature, the specs told me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why certain approaches were chosen&lt;/li&gt;
&lt;li&gt;What assumptions were made (and documented)&lt;/li&gt;
&lt;li&gt;How data flows through the system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this documentation, I would have re-learned the codebase every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Review Checkpoints
&lt;/h3&gt;

&lt;p&gt;The workflow forced me to pause and review:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After each task group&lt;/li&gt;
&lt;li&gt;Before major changes&lt;/li&gt;
&lt;li&gt;When user feedback was needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These pauses prevented rushing ahead with wrong assumptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where SDD Couldn't Replace Human Judgment
&lt;/h2&gt;

&lt;p&gt;But let's be honest about what structure can't solve:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Multi-Layer Integration Issues
&lt;/h3&gt;

&lt;p&gt;Specs describe individual components well. But when components need to work together, you still have to understand the full picture. The manualAttendanceDates bug required understanding backend + frontend + data contract all at once.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Hidden Assumptions and Edge Cases
&lt;/h3&gt;

&lt;p&gt;"Match the dates" seemed clear until timezone conversion bit me. "Use the library" seemed clear until format mismatches appeared. Good specs surface assumptions, but you often don't know what to surface until you've been bitten.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Third-Party Library Quirks
&lt;/h3&gt;

&lt;p&gt;Specs can't predict runtime behavior of external dependencies. You discover these by running code, inspecting output, and debugging when things don't work as documented.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. User Preferences and Domain Knowledge
&lt;/h3&gt;

&lt;p&gt;Structure can't tell you what users value or what domain-specific constraints matter. You still need actual user communication and domain expertise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest ROI of SDD
&lt;/h2&gt;

&lt;p&gt;Let's be real about the time investment:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time spent on specs:&lt;/strong&gt; ~4-5 hours across all features&lt;br&gt;
&lt;strong&gt;Time spent on implementation:&lt;/strong&gt; ~2 days&lt;br&gt;
&lt;strong&gt;Time spent debugging:&lt;/strong&gt; ~4-6 hours (timezone, manualAttendanceDates, library integration)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Total:&lt;/strong&gt; ~3 days for 8 features shipped and deployed&lt;/p&gt;

&lt;h3&gt;
  
  
  Where SDD Saved Time
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Debugging was faster:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Systematic task-by-task checking vs random file jumping&lt;/li&gt;
&lt;li&gt;Spec told me what should happen vs guessing&lt;/li&gt;
&lt;li&gt;Data flow documented vs reverse-engineering it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Resumability was huge:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Came back a week later, knew exactly where I left off&lt;/li&gt;
&lt;li&gt;Spec reminded me why decisions were made&lt;/li&gt;
&lt;li&gt;Task list showed what was done and what was next&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Fewer forgotten requirements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Everything documented upfront vs relying on memory&lt;/li&gt;
&lt;li&gt;Edge cases captured in spec vs discovered in production&lt;/li&gt;
&lt;li&gt;Complete feature set shipped vs "80% done" abandonment&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Where SDD Cost Time
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Upfront spec creation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;4-5 hours thinking and documenting&lt;/li&gt;
&lt;li&gt;But this is thinking time I'd need anyway, just formalized&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Learning the workflow:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First project with agent-os had a learning curve&lt;/li&gt;
&lt;li&gt;Second project would be faster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Maintaining documentation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When specs changed, had to update docs&lt;/li&gt;
&lt;li&gt;But this paid off when resuming work later&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;Spec-Driven Development isn't magic. It won't prevent bugs, eliminate debugging, or replace human judgment. But it provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structure&lt;/strong&gt; when you'd otherwise be lost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt; when you'd otherwise forget&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checkpoints&lt;/strong&gt; when you'd otherwise rush ahead with wrong assumptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resumability&lt;/strong&gt; when you'd otherwise re-learn the codebase&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The challenges I faced—timezone bugs, library quirks, multi-layer issues—would have happened with or without SDD. The difference is how I dealt with them:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without SDD:&lt;/strong&gt; Random debugging, forgotten context, abandoned projects&lt;br&gt;
&lt;strong&gt;With SDD:&lt;/strong&gt; Systematic debugging, documented decisions, completed features&lt;/p&gt;

&lt;p&gt;That's the honest ROI.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;We've now seen the complete picture: the workflow (Part 3) and where it actually helps vs where you still struggle (Part 4). You know the realistic benefits and the honest limitations.&lt;/p&gt;

&lt;p&gt;In Part 5, we'll wrap up with a decision framework: when should you use SDD for your project, and when is a simpler approach better? We'll also cover how to get started with Station Station yourself (it's free and open source), and where to go from here.&lt;/p&gt;

&lt;p&gt;If you're ready to make the call on whether Spec-Driven Development fits your workflow, Part 5 has the answers.&lt;/p&gt;

</description>
      <category>specdriven</category>
      <category>debugging</category>
      <category>ai</category>
      <category>claude</category>
    </item>
    <item>
      <title>Part 3: Building Station Station - Agent-OS Workflow in Action</title>
      <dc:creator>Koustubh</dc:creator>
      <pubDate>Tue, 04 Nov 2025 05:43:12 +0000</pubDate>
      <link>https://forem.com/koustubh/part-3-building-station-station-agent-os-workflow-in-action-1fp9</link>
      <guid>https://forem.com/koustubh/part-3-building-station-station-agent-os-workflow-in-action-1fp9</guid>
      <description>&lt;p&gt;In Parts 1 and 2, I introduced Spec-Driven Development and showed you the finished Station Station project—8 features, live on GitHub Pages, solving my real hybrid work compliance problem. But how did we actually get there? What does the agent-os workflow look like in practice?&lt;/p&gt;

&lt;p&gt;This part walks you through the complete development process, using real examples from Station Station. No theoretical abstractions—just the actual workflow I followed to go from "I need to track my office attendance" to a deployed web application.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five Phases of Agent-OS
&lt;/h2&gt;

&lt;p&gt;Agent-os structures development into five distinct phases, each with specific deliverables and human review checkpoints. For this project, I used &lt;strong&gt;Claude&lt;/strong&gt; (Anthropic's AI) as the AI assistant throughout the entire workflow—from spec shaping to implementation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AGENT-OS DEVELOPMENT WORKFLOW
════════════════════════════

1. Create Product  →  2. Shape Spec  →  3. Write Spec  →  4. Write Tasks  →  5. Implement
   Mission &amp;amp; Roadmap   Requirements      Technical Spec    Task Breakdown    AI + Human Code
         ↓                  ↓                  ↓                 ↓                 ↓
   [Human Review]     [Human Review]     [Human Review]    [Human Review]    [Human Review]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the review checkpoints? That's deliberate. You're not waiting until the end to discover the AI misunderstood your requirements. You're validating assumptions at each phase before moving forward.&lt;/p&gt;

&lt;p&gt;Let me show you what each phase actually looks like.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 1: Create Product
&lt;/h2&gt;

&lt;p&gt;The first step isn't writing code—it's defining what you're building and why. You start with just a raw idea, and agent-os helps you shape it into a structured product plan.&lt;/p&gt;

&lt;p&gt;Here's how it actually works:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You start with a simple idea:&lt;/strong&gt;&lt;br&gt;
"I need to track my office attendance using my Myki train card data to meet my company's 50% hybrid work requirement."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent-os asks clarifying questions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What problem are you solving? Who's the target user?&lt;/li&gt;
&lt;li&gt;What are your key constraints? (Budget, timeline, technical preferences)&lt;/li&gt;
&lt;li&gt;What features are must-haves vs nice-to-haves?&lt;/li&gt;
&lt;li&gt;Do you have preferred technologies or deployment platforms?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;You answer honestly:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Problem: I need proactive visibility into my attendance, not reactive manager notifications&lt;/li&gt;
&lt;li&gt;User: Primarily me, but could be useful for other hybrid workers&lt;/li&gt;
&lt;li&gt;Constraints: Zero hosting costs, mobile-first (I'll check this on my phone)&lt;/li&gt;
&lt;li&gt;Tech: Python (I'm comfortable with it), React (modern and fast), GitHub Pages (free)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Agent-os generates three documents:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;mission.md&lt;/strong&gt; - The "why" behind the project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Problem: Hybrid workers need transparency into office attendance compliance&lt;/li&gt;
&lt;li&gt;Solution: Automated attendance tracking using Melbourne's Myki transit data&lt;/li&gt;
&lt;li&gt;Target user: Me (and anyone else commuting via train for hybrid work)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;roadmap.md&lt;/strong&gt; - The "what" we're building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;8 features broken into 3 phases&lt;/li&gt;
&lt;li&gt;Phase 1: Foundation (auth + API discovery)&lt;/li&gt;
&lt;li&gt;Phase 2: Data Layer (extraction + processing)&lt;/li&gt;
&lt;li&gt;Phase 3: Integration &amp;amp; UI (frontend + config)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;tech-stack.md&lt;/strong&gt; - The "how" we'll build it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backend: Python + Playwright for browser automation&lt;/li&gt;
&lt;li&gt;Frontend: React + Tailwind CSS v4&lt;/li&gt;
&lt;li&gt;Deployment: GitHub Pages (static site, zero hosting costs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This took about 30 minutes—mostly answering questions and reviewing the generated documents. You don't write these from scratch; the AI creates them based on your answers. But you review and refine them to ensure they match your vision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human role here:&lt;/strong&gt; You provide the raw idea and answer questions about your constraints, priorities, and technical preferences. AI can't guess that you want zero hosting costs or mobile-first design—that's domain knowledge only you have.&lt;/p&gt;
&lt;h2&gt;
  
  
  Phase 2: Shape the Spec
&lt;/h2&gt;

&lt;p&gt;Once you have the product defined, you pick a feature from the roadmap and start spec shaping. This is where agent-os really shines—the spec-shaper agent asks clarifying questions to gather detailed requirements.&lt;/p&gt;

&lt;p&gt;For the React frontend feature, the shaping process went like this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spec-shaper:&lt;/strong&gt; "What data format will the frontend consume?"&lt;br&gt;
&lt;strong&gt;Me:&lt;/strong&gt; "JSON from GitHub raw URL, generated by Python backend"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spec-shaper:&lt;/strong&gt; "What visualizations do you need?"&lt;br&gt;
&lt;strong&gt;Me:&lt;/strong&gt; "Calendar view with attended days marked, monthly bar chart showing attendance trends, summary statistics"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spec-shaper:&lt;/strong&gt; "Mobile or desktop priority?"&lt;br&gt;
&lt;strong&gt;Me:&lt;/strong&gt; "Mobile-first. I'll be checking this on my phone most mornings."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spec-shaper:&lt;/strong&gt; "Should the calendar be read-only or interactive?"&lt;br&gt;
&lt;strong&gt;Me:&lt;/strong&gt; "Interactive—clicking an attended day should show which station and what time I tapped on."&lt;/p&gt;

&lt;p&gt;This back-and-forth surfaces implementation details you might not think about upfront. By the end of shaping, you have a clear picture of what "build the frontend" actually means.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important: This is iterative.&lt;/strong&gt; The spec-shaper generates a draft spec based on your answers. You review it. If something's not quite right—maybe it misunderstood your intent, or you realized you forgot to mention a key requirement—you provide more input. The spec gets refined. You review again. This continues until you're satisfied.&lt;/p&gt;

&lt;p&gt;For Station Station, I went through 2-3 refinement rounds on some specs. The first draft might have missed that I wanted public holidays automatically displayed on the calendar. I'd point that out, and the spec would be updated to include it. No starting over—just iterative improvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human role here:&lt;/strong&gt; Answer questions honestly about your use case, then review and refine the generated spec. The AI can't guess that you'll primarily use this on mobile, or that you care more about quick glances than detailed analytics. And if the first draft misses something, that's fine—just keep refining until it's right.&lt;/p&gt;
&lt;h2&gt;
  
  
  Phase 3: Write the Spec
&lt;/h2&gt;

&lt;p&gt;Now the spec-writer agent takes your answers from the shaping phase and generates a detailed technical specification. Here's what the actual spec looked like for the frontend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Specification: Attendance Tracker Frontend UI&lt;/span&gt;

&lt;span class="gu"&gt;## Goal&lt;/span&gt;
Build a responsive static React web application to visualize work attendance data
from the Myki attendance tracker JSON output, enabling users to view attendance
statistics, explore monthly calendars with marked attended days, analyze trends
through bar charts, and filter data by date ranges across mobile and desktop devices.

&lt;span class="gu"&gt;## User Stories&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; As a user, I want to see a monthly calendar with my attended days visually marked
  so that I can quickly identify when I was at the office
&lt;span class="p"&gt;-&lt;/span&gt; As a user, I want to view monthly attendance percentages in a bar chart so that I
  can understand my attendance trends over time
&lt;span class="p"&gt;-&lt;/span&gt; As a user, I want to filter data by date range so that I can focus on specific
  time periods like quarters or financial years

&lt;span class="gu"&gt;## Specific Requirements&lt;/span&gt;

&lt;span class="gs"&gt;**Calendar View Component**&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Display monthly grid calendar showing current month by default
&lt;span class="p"&gt;-&lt;/span&gt; Provide previous/next month navigation buttons for browsing history
&lt;span class="p"&gt;-&lt;/span&gt; Mark attended days with red visual indicators (red background circle or dot)
&lt;span class="p"&gt;-&lt;/span&gt; Make attended days clickable to show detail modal or tooltip
&lt;span class="p"&gt;-&lt;/span&gt; Display timestamp and target station name when attended day is clicked
&lt;span class="p"&gt;-&lt;/span&gt; Use react-calendar library for calendar functionality
&lt;span class="p"&gt;-&lt;/span&gt; Ensure keyboard navigation support for accessibility
&lt;span class="p"&gt;-&lt;/span&gt; Mobile-optimized with touch-friendly date selection

&lt;span class="gs"&gt;**Monthly Bar Chart Visualization**&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Display one bar per month showing attendance percentage (0-100%)
&lt;span class="p"&gt;-&lt;/span&gt; Use Recharts library for rendering responsive bar charts
&lt;span class="p"&gt;-&lt;/span&gt; Color bars in red theme to match attended day indicators
&lt;span class="p"&gt;-&lt;/span&gt; Include tooltips showing exact percentage, working days, and days attended on hover
&lt;span class="p"&gt;-&lt;/span&gt; Ensure chart is fully responsive and readable on mobile screens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This continues for 9 specific requirement areas, totaling about 100 lines of detailed specifications. The spec-writer captured my shaping answers and translated them into implementable requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; Notice the specificity. Not "build a chart" but "use Recharts library, red theme, responsive on mobile, tooltips on hover." That level of detail lets the AI implement without guessing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Just like shaping, this is iterative too.&lt;/strong&gt; The spec-writer generates a detailed spec based on the shaped requirements. You review it carefully. Maybe you notice it specified the wrong color theme, or it didn't include a requirement for error handling, or the accessibility requirements aren't strong enough. You provide feedback, and the spec gets updated. Review again. Refine again. Keep going until the spec accurately represents what you want built.&lt;/p&gt;

&lt;p&gt;For the frontend spec, I noticed the first draft didn't specify what should happen when the JSON fetch fails. I asked for better error handling requirements—retry option, user-friendly messages, graceful degradation. The spec was updated to include those details. Same with accessibility—I pushed for stronger requirements around keyboard navigation and screen reader support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human role here:&lt;/strong&gt; Review the spec thoroughly and keep refining until it's right. Did it capture your intent? Are there edge cases missing? Requirements that don't make sense? This is your last chance to catch misunderstandings before code gets written, so it's worth taking the time to get it right.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 4: Write Tasks
&lt;/h2&gt;

&lt;p&gt;With an approved spec, the task-writer agent breaks it into granular, actionable tasks. Here's how the frontend spec became 6 task groups with 40+ individual tasks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Task Breakdown: Attendance Tracker Frontend UI&lt;/span&gt;

&lt;span class="gu"&gt;## Task List&lt;/span&gt;

&lt;span class="gu"&gt;### Task Group 1: Initial Project Setup&lt;/span&gt;
&lt;span class="gs"&gt;**Dependencies:**&lt;/span&gt; None
&lt;span class="p"&gt;
-&lt;/span&gt; [x] 1.1 Create new Vite React project
&lt;span class="p"&gt;  -&lt;/span&gt; Run: &lt;span class="sb"&gt;`npm create vite@latest attendance-tracker -- --template react`&lt;/span&gt;
&lt;span class="p"&gt;  -&lt;/span&gt; Navigate into project directory
&lt;span class="p"&gt;  -&lt;/span&gt; Install base dependencies: &lt;span class="sb"&gt;`npm install`&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; [x] 1.2 Install and configure Tailwind CSS
&lt;span class="p"&gt;  -&lt;/span&gt; Install: &lt;span class="sb"&gt;`npm install -D tailwindcss postcss autoprefixer`&lt;/span&gt;
&lt;span class="p"&gt;  -&lt;/span&gt; Initialize: &lt;span class="sb"&gt;`npx tailwindcss init -p`&lt;/span&gt;
&lt;span class="p"&gt;  -&lt;/span&gt; Configure tailwind.config.js with content paths and custom 'attended' color (#ef4444)
&lt;span class="p"&gt;  -&lt;/span&gt; Add Tailwind directives to src/index.css
&lt;span class="p"&gt;
-&lt;/span&gt; [x] 1.3 Install required libraries
&lt;span class="p"&gt;  -&lt;/span&gt; Chart library: &lt;span class="sb"&gt;`npm install recharts`&lt;/span&gt;
&lt;span class="p"&gt;  -&lt;/span&gt; Calendar library: &lt;span class="sb"&gt;`npm install react-calendar`&lt;/span&gt;
&lt;span class="p"&gt;  -&lt;/span&gt; Date picker: &lt;span class="sb"&gt;`npm install react-datepicker`&lt;/span&gt;

&lt;span class="gu"&gt;### Task Group 2: Data Fetching and Processing&lt;/span&gt;
&lt;span class="gs"&gt;**Dependencies:**&lt;/span&gt; Task Group 1
&lt;span class="p"&gt;
-&lt;/span&gt; [x] 2.1 Write 2-6 focused tests for data utilities
&lt;span class="p"&gt;  -&lt;/span&gt; Test JSON fetch success scenario
&lt;span class="p"&gt;  -&lt;/span&gt; Test error handling for network failures
&lt;span class="p"&gt;  -&lt;/span&gt; Test date filtering calculation
&lt;span class="p"&gt;
-&lt;/span&gt; [x] 2.2 Create data fetching utility
&lt;span class="p"&gt;  -&lt;/span&gt; File: src/utils/dataFetcher.js
&lt;span class="p"&gt;  -&lt;/span&gt; Implement fetchAttendanceData() function
&lt;span class="p"&gt;  -&lt;/span&gt; URL: https://raw.githubusercontent.com/koustubh25/station-station/main/output/attendance.json
&lt;span class="p"&gt;  -&lt;/span&gt; Use cache: 'no-cache' for fresh data
&lt;span class="p"&gt;  -&lt;/span&gt; Handle network errors with descriptive messages
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each task is concrete enough that I could hand it to any developer (or AI) and they'd know exactly what to build. Dependencies are explicit—you can't build the calendar component until data fetching works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human role here:&lt;/strong&gt; Review the task breakdown. Is anything missing? Are tasks sequenced correctly? Do the dependencies make sense? Sometimes AI misses edge cases or creates circular dependencies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 5: Implement Tasks
&lt;/h2&gt;

&lt;p&gt;This is where AI assistance actually writes code. But it's not fully autonomous—there are specific checkpoints where human review is critical.&lt;/p&gt;

&lt;p&gt;For Station Station, the implementation flow looked like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;AI implements task&lt;/strong&gt; - The implementer agent writes code according to the task spec&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI runs tests&lt;/strong&gt; - Verifies the implementation works (if tests exist)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human reviews output&lt;/strong&gt; - You check the code, test it manually, and approve or request changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Move to next task&lt;/strong&gt; - Repeat for each task in the breakdown&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's where I learned an important aspect: agent-os CLI permissions. The AI can read files, write files, and run tests. But certain operations require your explicit approval:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Git commits&lt;/strong&gt; - You review and commit changes yourself&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git pushes&lt;/strong&gt; - You decide when to push to remote&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow triggers&lt;/strong&gt; - You manually kick off CI/CD pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is by design. You maintain control over version history and deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But here's what I discovered:&lt;/strong&gt; After gaining confidence in the AI-generated code—once I'd reviewed a few implementations and saw they were solid—I started allowing the AI to do &lt;code&gt;git push&lt;/code&gt; and use the &lt;code&gt;gh&lt;/code&gt; CLI to view or trigger GitHub Actions workflows. This let the AI work more autonomously: push code, trigger the build, check if tests passed, and if they failed, fix the issues and try again.&lt;/p&gt;

&lt;p&gt;The workflow became: AI implements → AI pushes → AI triggers workflow → AI monitors results → If failures, AI fixes and repeats. I'd check in periodically, but for well-defined tasks, the AI could iterate autonomously until everything passed.&lt;/p&gt;

&lt;p&gt;This isn't the default (and probably shouldn't be for unfamiliar projects), but once you've established trust through review, you can grant more autonomy where it makes sense.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Agent-os also has an &lt;code&gt;orchestrate-tasks&lt;/code&gt; command that provides even more advanced multi-agent coordination and autonomous task execution. But that's beyond the scope of this blog—we'll cover it in detail in a future post. For Station Station, the standard task-by-task implementation workflow was sufficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human role here:&lt;/strong&gt; Code review, manual testing, and deployment decisions. AI can generate the boilerplate, but you verify it works in your specific context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Complete Agent-OS Workflow
&lt;/h2&gt;

&lt;p&gt;Now that we've walked through all five phases, let's see how they fit together into a complete cycle.&lt;/p&gt;

&lt;p&gt;The agent-os workflow follows a structured, iterative cycle. Notice the feedback loop where human review catches issues that require debugging before the feature is complete. This isn't full automation—it's a partnership where AI handles implementation and humans guide the architecture and review the results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fkoustubh25%2Fstation-station%2Fmain%2Fagent-os%2Fspecs%2F2025-11-03-technical-blog-sdd%2Fplanning%2Fvisuals%2Fagent-os-workflow-diagram.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fkoustubh25%2Fstation-station%2Fmain%2Fagent-os%2Fspecs%2F2025-11-03-technical-blog-sdd%2Fplanning%2Fvisuals%2Fagent-os-workflow-diagram.png" alt="Agent-OS workflow diagram showing iterative cycle: Create Product, Shape Spec, Write Spec, Write Tasks, Implement Tasks, Human Review with feedback loop for debugging and refinement until feature is complete" width="800" height="1925"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Human Review at Key Decision Points
&lt;/h3&gt;

&lt;p&gt;This sequence diagram reveals the continuous human-AI collaboration throughout development. Review happens at multiple stages, not just at the end. Each phase includes a human checkpoint where you validate the AI's work before proceeding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fkoustubh25%2Fstation-station%2Fmain%2Fagent-os%2Fspecs%2F2025-11-03-technical-blog-sdd%2Fplanning%2Fvisuals%2Fagent-os-task-execution-flow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fkoustubh25%2Fstation-station%2Fmain%2Fagent-os%2Fspecs%2F2025-11-03-technical-blog-sdd%2Fplanning%2Fvisuals%2Fagent-os-task-execution-flow.png" alt="Sequence diagram showing Agent-OS task execution flow: Human provides feature idea to Spec Writer, who gathers requirements and generates detailed spec. Task Writer breaks spec into tasks for Human approval. Task Implementer executes each task, writes tests, and submits for Review. Human reviews results and either approves or provides guidance for fixes. Process loops until feature is complete" width="800" height="1036"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The key insight: this is continuous collaboration, not "AI does everything then human reviews at the end." You're involved throughout, making architectural decisions, reviewing outputs, and course-correcting when needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Iterative Reality
&lt;/h2&gt;

&lt;p&gt;Here's what the spec doesn't show: iteration. The workflow diagrams make it look linear, but reality is messier.&lt;/p&gt;

&lt;p&gt;For Station Station, I went through multiple rounds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Spec refinement:&lt;/strong&gt; Realized mid-development I needed manual attendance dates (for days I drove to work instead of taking the train). Went back and updated the spec.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task adjustments:&lt;/strong&gt; Some tasks were too large and got broken into smaller chunks. Others were unnecessary and got removed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implementation bugs:&lt;/strong&gt; AI couldn't fix the &lt;code&gt;manualAttendanceDates&lt;/code&gt; field bug after several attempts. I had to review the code, identify the issue location, then let AI implement the fix.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New specs added later:&lt;/strong&gt; The initial roadmap had 8 features, but I added more specs later for enhancements like security improvements and manual attendance features. You don't have to plan everything upfront—you can always create new specs for additional features as needs emerge.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The workflow provides structure, but you'll loop back. That's normal. The key difference from ad-hoc AI chat is that when you loop back, you update the spec or tasks—so the system stays consistent. Future features can reference the updated spec instead of inheriting outdated assumptions.&lt;/p&gt;

&lt;p&gt;And when you think of new features? Just create a new spec and go through the same workflow. The product documentation evolves with your project.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes This Different
&lt;/h2&gt;

&lt;p&gt;The structured approach of agent-os SDD provides several key benefits:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clear Direction Throughout&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every feature starts with documented requirements, not assumptions&lt;/li&gt;
&lt;li&gt;The roadmap gives you a clear view of what's done and what's next&lt;/li&gt;
&lt;li&gt;When you solve one problem (like Cloudflare bypass), the spec tells you exactly what comes next&lt;/li&gt;
&lt;li&gt;No more "I got this working, but now what?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Persistent Context&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Because you have a record of all specs, tasks, and their completion status, the AI can pick up exactly where you left off—even weeks later&lt;/li&gt;
&lt;li&gt;Come back after a break: "Task 7 is complete, Task 8 is next, here's what needs to be done"&lt;/li&gt;
&lt;li&gt;No context loss, no re-explaining what you've already built&lt;/li&gt;
&lt;li&gt;The documentation serves as persistent memory across sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Easier Debugging&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When something breaks, you can reference the spec to understand intended behavior&lt;/li&gt;
&lt;li&gt;Task breakdown makes it easy to isolate which component is failing&lt;/li&gt;
&lt;li&gt;Specs document edge cases and requirements that are easy to forget during implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Iterative Refinement&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Update specs as you learn—they evolve with your understanding&lt;/li&gt;
&lt;li&gt;Add new specs for new features without disrupting existing work&lt;/li&gt;
&lt;li&gt;Each iteration is documented, so you can see why decisions were made&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The time investment is front-loaded. Spec creation took longer than just prompting Claude to "build a frontend." But I shipped all 8 planned features. The debugging was easier. The resumability was huge. And when I added new features a week later, the specs told me exactly where to hook them in.&lt;/p&gt;

&lt;p&gt;That's the ROI of Spec-Driven Development—not faster initial code generation, but fewer surprises, clearer direction, and maintainable progress.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;We've seen the agent-os workflow in action: creating products, shaping specs, writing detailed specifications, breaking down tasks, and implementing with AI assistance. We have a structured process that transforms vague ideas into working code.&lt;/p&gt;

&lt;p&gt;But this is the part where I need to be honest about limitations. The workflow isn't magic. AI still struggles with certain problems, and some features require significant human intervention. In Part 4, we'll dive into the real challenges: debugging stories where AI failed, the collaboration spectrum between AI and human, and when to know if Spec-Driven Development is overkill for your project.&lt;/p&gt;

&lt;p&gt;If you're wondering whether this structured approach is always worth it—or where it breaks down—Part 4 has the answers.&lt;/p&gt;

</description>
      <category>specdriven</category>
      <category>agentos</category>
      <category>ai</category>
      <category>claude</category>
    </item>
    <item>
      <title>Part 2: Building Station Station - A Real-World SDD Case Study</title>
      <dc:creator>Koustubh</dc:creator>
      <pubDate>Tue, 04 Nov 2025 05:41:57 +0000</pubDate>
      <link>https://forem.com/koustubh/part-2-building-station-station-a-real-world-sdd-case-study-15p4</link>
      <guid>https://forem.com/koustubh/part-2-building-station-station-a-real-world-sdd-case-study-15p4</guid>
      <description>&lt;p&gt;In Part 1, I introduced Spec-Driven Development and explained how it provides a structured alternative to the AI chat trial-and-error loop. Now let's get concrete. This is the story of building Station Station—a real application that solves a real problem I face as a Melbourne train commuter working hybrid.&lt;/p&gt;

&lt;p&gt;This isn't a contrived demo or a toy project. It's a fully functional web application I use regularly, built in about 3 days using the agent-os SDD workflow, now live at &lt;a href="https://koustubh25.github.io/station-station/" rel="noopener noreferrer"&gt;https://koustubh25.github.io/station-station/&lt;/a&gt;. In this part, I'll walk you through the problem it solves, the technical architecture, the 8-feature roadmap we shipped, and the real challenges we encountered along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Hybrid Work Compliance Meets Train Commuting
&lt;/h2&gt;

&lt;p&gt;My company, like many post-pandemic workplaces, has a hybrid policy: 50% office attendance required. Sounds reasonable, right? But here's the catch—there's no automated way to track whether I'm meeting that threshold.&lt;/p&gt;

&lt;p&gt;My manager periodically notifies me of my compliance status—"You're at 42% this month, need to increase office days"—but by then, I'm already behind. I have no visibility into my attendance until someone tells me I'm non-compliant.&lt;/p&gt;

&lt;p&gt;The real problem isn't just manual tracking—it's &lt;strong&gt;reactive&lt;/strong&gt; tracking. I want to proactively manage my hybrid schedule: "It's mid-month and I'm at 48%, so I should go to the office this week to stay on track." Or "I'm at 55%, so I can safely work from home the rest of the month."&lt;/p&gt;

&lt;p&gt;This kind of strategic planning requires real-time visibility into my attendance data, not periodic manager notifications. The advantages of tracking this myself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Autonomy&lt;/strong&gt;: Control my schedule instead of reacting to manager feedback&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strategic planning&lt;/strong&gt;: Make informed WFH decisions based on current compliance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid surprises&lt;/strong&gt;: Never get caught off-guard by non-compliance notifications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt;: Balance office days with personal needs (appointments, errands, weather)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the kind of repetitive, rule-based task that should be automated. But the Myki portal doesn't offer attendance tracking—it's designed for viewing transactions, not inferring work patterns. No third-party tools exist because the Myki API is undocumented and protected by Cloudflare bot detection.&lt;/p&gt;

&lt;p&gt;Perfect problem space for a personal automation project.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Automated Attendance Tracking from Transit Data
&lt;/h2&gt;

&lt;p&gt;Station Station automatically determines office attendance by analyzing Myki transaction data. The core logic is simple: if I tapped on or off at my designated work station on any given day, that counts as an office day. The application handles the complexity of data extraction, date calculations, and visualization.&lt;/p&gt;

&lt;p&gt;Here's the architecture:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Extraction Layer (Python + Playwright)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Headless browser automation bypasses Cloudflare Turnstile bot detection using browser profile trust signals&lt;/li&gt;
&lt;li&gt;Authenticated session extracts cookies and Bearer tokens for API calls&lt;/li&gt;
&lt;li&gt;Transaction history retrieved for user-specified date ranges&lt;/li&gt;
&lt;li&gt;Data parsed and normalized into structured JSON format&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Processing Layer (Python)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Attendance logic analyzes transactions to identify work station visits&lt;/li&gt;
&lt;li&gt;Configurable rules support:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Work days&lt;/strong&gt;: Default Monday-Friday, customizable per user&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skip dates&lt;/strong&gt;: Exclude specific dates (sick leave, planned vacation, public holidays)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual attendance dates&lt;/strong&gt;: Override for days you attended work without taking the train (drove to work, carpooled, etc.)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Monthly aggregations calculate attendance statistics and percentages&lt;/li&gt;

&lt;li&gt;JSON storage provides data persistence with optional GitHub backup for version history&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Presentation Layer (React + Tailwind CSS v4)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interactive calendar displays attended days with color-coded indicators&lt;/li&gt;
&lt;li&gt;Monthly bar chart visualizes attendance trends over time&lt;/li&gt;
&lt;li&gt;Summary dashboard shows attendance percentage, working days, and days attended/missed&lt;/li&gt;
&lt;li&gt;Date range filtering allows focused analysis of specific periods&lt;/li&gt;
&lt;li&gt;Public holidays automatically detected and displayed for context&lt;/li&gt;
&lt;li&gt;Fully responsive design optimized for mobile (my primary use case)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Deployment (GitHub Pages)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Static site deployment—no backend hosting required&lt;/li&gt;
&lt;li&gt;React app consumes JSON data generated by Python backend&lt;/li&gt;
&lt;li&gt;GitHub Actions automates data updates and deployment&lt;/li&gt;
&lt;li&gt;Zero ongoing hosting costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Fully Autonomous Operation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the crucial part: once configured, the system runs completely autonomously. Every day, GitHub Actions automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Spins up a Docker container&lt;/li&gt;
&lt;li&gt;Authenticates with the Myki portal&lt;/li&gt;
&lt;li&gt;Fetches new transaction data&lt;/li&gt;
&lt;li&gt;Calculates updated attendance statistics&lt;/li&gt;
&lt;li&gt;Commits the updated JSON to the repository&lt;/li&gt;
&lt;li&gt;Deploys the refreshed dashboard to GitHub Pages&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I don't touch anything. I just open the dashboard URL on my phone each morning and see my current attendance percentage. No manual data entry, no clicking through the Myki portal, no spreadsheet updates. The app handles everything from data extraction to visualization without human intervention.&lt;/p&gt;

&lt;p&gt;And it's completely free. GitHub Actions provides 2,000 free minutes per month for private repositories (unlimited for public repos), GitHub Pages hosting is free, and the entire stack runs on free tiers. Zero ongoing costs for a fully automated daily workflow.&lt;/p&gt;

&lt;p&gt;The beauty of this architecture is its separation of concerns: Python handles the complex authentication and data extraction (runs locally or in GitHub Actions), React handles the user-friendly visualization (deployed statically), and JSON serves as the contract between them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Roadmap: 8 Features Across 3 Phases
&lt;/h2&gt;

&lt;p&gt;Here's exactly what we built, broken down into three logical phases:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1: Foundation (Authentication &amp;amp; API Discovery)&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Myki Authentication &amp;amp; Cloudflare Bypass&lt;/strong&gt; (Large, 2 days)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser automation with Playwright to bypass Cloudflare Turnstile detection&lt;/li&gt;
&lt;li&gt;Profile-based trust signals (copy Chrome profile data) to appear as legitimate user&lt;/li&gt;
&lt;li&gt;Multi-step authentication flow to extract session cookies and Bearer token&lt;/li&gt;
&lt;li&gt;Critical blocker—nothing else could work until this was solved&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Transaction History API Reverse Engineering&lt;/strong&gt; (Medium, 1 day)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Network traffic analysis to identify undocumented Myki API endpoints&lt;/li&gt;
&lt;li&gt;Request header mapping (x-ptvwebauth, x-verifytoken, authorization Bearer)&lt;/li&gt;
&lt;li&gt;Response parsing to extract transaction data&lt;/li&gt;
&lt;li&gt;Session persistence handling (Bearer token expires after ~20 minutes)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Phase 2: Data Layer (Extraction &amp;amp; Processing)&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Myki SDK / Data Retrieval - Browser-based&lt;/strong&gt; (Medium, 1 day)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser-based scraping as fallback (API reverse engineering had limitations)&lt;/li&gt;
&lt;li&gt;Transaction data parsing (station names, timestamps, tap on/off events)&lt;/li&gt;
&lt;li&gt;Date range query support for historical data retrieval&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Card Selection &amp;amp; Date Range Handling&lt;/strong&gt; (Small, &amp;lt;1 day)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Programmatic Myki card selection from user's registered cards&lt;/li&gt;
&lt;li&gt;Configurable date range parameters for transaction queries&lt;/li&gt;
&lt;li&gt;Input validation and error handling&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Attendance Logic &amp;amp; JSON Storage&lt;/strong&gt; (Medium, 1 day)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Station-based attendance detection (checks if user visited their designated work station)&lt;/li&gt;
&lt;li&gt;Daily and monthly attendance aggregations&lt;/li&gt;
&lt;li&gt;Skip dates support for planned absences&lt;/li&gt;
&lt;li&gt;Manual attendance override for non-PTV commutes (drove to work)&lt;/li&gt;
&lt;li&gt;Structured JSON output with separate sections for clarity&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Phase 3: Integration &amp;amp; UI (Visualization &amp;amp; Configuration)&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GitHub Integration for Data Backup&lt;/strong&gt; (Small, &amp;lt;1 day)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Optional GitHub repository integration for data version control&lt;/li&gt;
&lt;li&gt;Automated commits and pushes via GitHub Actions&lt;/li&gt;
&lt;li&gt;Historical tracking and audit trail&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;React Frontend Dashboard&lt;/strong&gt; (Medium, 1 day)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interactive calendar component with attended days highlighted&lt;/li&gt;
&lt;li&gt;Monthly statistics bar chart for trend analysis&lt;/li&gt;
&lt;li&gt;Summary dashboard (attendance %, working days, days attended/missed)&lt;/li&gt;
&lt;li&gt;Date range filtering with react-datepicker integration&lt;/li&gt;
&lt;li&gt;Public holidays display using date-holidays library&lt;/li&gt;
&lt;li&gt;Fully responsive mobile-first design with Tailwind CSS v4&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Live at: &lt;a href="https://koustubh25.github.io/station-station/" rel="noopener noreferrer"&gt;https://koustubh25.github.io/station-station/&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Configuration Management &amp;amp; User Setup&lt;/strong&gt; (Small, &amp;lt;1 day)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub secrets for secure credential storage (MYKI_USERNAME, MYKI_PASSWORD, MYKI_CARDNUMBER)&lt;/li&gt;
&lt;li&gt;Config files for attendance settings (target station, skip dates, manual dates)&lt;/li&gt;
&lt;li&gt;Multi-user support with user-specific configuration&lt;/li&gt;
&lt;li&gt;Setup documentation and README&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Cloudflare bypass was the critical path item. Once we solved that authentication challenge on Day 1-2, the rest of the features flowed in rapid succession. This is the power of proper sequencing—identifying technical blockers early and solving them first unlocks everything downstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The application is live at &lt;a href="https://koustubh25.github.io/station-station/" rel="noopener noreferrer"&gt;https://koustubh25.github.io/station-station/&lt;/a&gt;. You'll see my actual attendance data (with privacy considerations—no personal identifiers exposed). The dashboard shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Attendance Calendar&lt;/strong&gt;: Interactive monthly view with attended days highlighted in green&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monthly Statistics&lt;/strong&gt;: Bar chart showing attendance trends across months&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Summary Dashboard&lt;/strong&gt;: At-a-glance metrics (attendance percentage, working days, days attended/missed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Date Range Filtering&lt;/strong&gt;: Focus on specific periods for detailed analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Public Holidays&lt;/strong&gt;: Automatically displayed for context (Victoria, Australia)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repository is public at &lt;a href="https://github.com/koustubh25/station-station" rel="noopener noreferrer"&gt;https://github.com/koustubh25/station-station&lt;/a&gt; (if you want to deploy this yourself—more on that in Part 5). The frontend is fully responsive, so try it on your phone. That's where I use it most—quick morning check to see my monthly progress.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Technical Challenges
&lt;/h2&gt;

&lt;p&gt;Let me highlight the three biggest technical challenges we faced. These weren't abstract problems—they were real blockers that required careful problem-solving.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Cloudflare Turnstile Bypass&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Myki portal is protected by Cloudflare Turnstile, which actively detects and blocks headless browsers. Our first attempt using standard Playwright headless mode failed immediately with the "Verifying you are human..." overlay blocking form access.&lt;/p&gt;

&lt;p&gt;The solution: browser profile trust signals. Playwright launches with an empty Chrome profile directory structure (Cookies, Preferences, History, Web Data, Login Data files), which provides just enough profile metadata to appear as a real browser without containing any actual user data. This runs on GitHub Actions runners, so there's no risk to user credentials—it's a fresh environment each time. Cloudflare recognizes the profile structure as legitimate traffic. Running in headed (visible) mode rather than headless also helps avoid automation detection signals.&lt;/p&gt;

&lt;p&gt;This was the critical blocker for the entire project. Without solving Cloudflare bypass, we couldn't access any Myki data. It took domain knowledge of browser fingerprinting and creative problem-solving—not something AI could generate from patterns alone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. API Reverse Engineering&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Myki transaction API is completely undocumented. We had to analyze browser network traffic to identify endpoints, required headers, and authentication patterns.&lt;/p&gt;

&lt;p&gt;Key discoveries through network analysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bearer token location: extracted from authentication response &lt;code&gt;data.token&lt;/code&gt; field&lt;/li&gt;
&lt;li&gt;Required headers: &lt;code&gt;x-ptvwebauth&lt;/code&gt;, &lt;code&gt;x-verifytoken&lt;/code&gt;, &lt;code&gt;x-passthruauth&lt;/code&gt;, &lt;code&gt;authorization: Bearer &amp;lt;token&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Token expiration: ~20 minutes, requires re-authentication for long-running sessions&lt;/li&gt;
&lt;li&gt;Request structure: specific query parameters for date ranges and card selection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's a snippet of the authentication configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;myki_config.json&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;structure&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"users"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"targetStation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Your Work Station Name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"skipDates"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"2025-12-25"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-01"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"startDate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-04-15"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"manualAttendanceDates"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"2025-11-03"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AI could implement the API client once we documented these patterns in the spec. But identifying which headers were critical, understanding token lifetime, and mapping the authentication flow required human analysis of network traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Transaction Parsing and Attendance Logic&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Inferring office attendance from train transactions required clear, simple logic. The core rule: "Did the user visit their work station on this date?"&lt;/p&gt;

&lt;p&gt;Our solution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If any transaction on date D involves the user's designated target station, count D as attended&lt;/li&gt;
&lt;li&gt;Card type, fare type, transaction status - none of that matters for attendance detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skip dates&lt;/strong&gt;: Explicitly exclude dates that shouldn't count (sick leave, planned vacation). Public holidays are automatically detected and excluded using Victoria, Australia's holiday calendar. For example, if I'm sick on a Tuesday, I add that date to &lt;code&gt;skipDates&lt;/code&gt; so it doesn't count against my attendance percentage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual attendance dates&lt;/strong&gt;: Override for days I attended work but didn't take the train. For instance, if I drove to work or carpooled, there's no Myki transaction, but I still want it counted as an office day. I add that date to &lt;code&gt;manualAttendanceDates&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This handles the edge cases that real-world commuting brings. Some weeks I drive because I need my car for errands after work. Some days I'm sick or on vacation. The configuration system makes it easy to adjust attendance tracking to match reality, not just train transactions.&lt;/p&gt;

&lt;p&gt;This domain-specific logic required human understanding of the problem space and business rules. AI could generate the code structure, but the attendance rules came from me knowing how I actually commute and what counts as "attendance."&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Actually Built
&lt;/h2&gt;

&lt;p&gt;So what does ~3 days of structured development look like in practice? Here's what we ended up with:&lt;/p&gt;

&lt;p&gt;The codebase is about 6,000 lines—roughly 3,500 lines of Python for the backend extraction and processing logic, and 2,500 lines of React/CSS for the frontend dashboard. That's 8 UI components, 3 custom React hooks, and a handful of utility modules.&lt;/p&gt;

&lt;p&gt;Performance-wise, it's pretty snappy. Lighthouse scores are in the 95+ range across the board (accessibility hits 100). The bundle is 65KB gzipped, loads in under a second on 3G, and becomes interactive in under 2 seconds. Not bad for a dashboard with charts and calendars. The frontend doesn't do any complex calculations—all the heavy lifting (attendance logic, date processing) happens in Python during the data extraction phase. The React app just renders pre-computed JSON.&lt;/p&gt;

&lt;p&gt;I built this in about 3 days of active development. That includes writing the specs upfront, breaking down tasks, implementing features, and deploying to GitHub Pages. The upfront time spent on specs definitely paid off—I had way fewer "wait, what was I supposed to build here?" moments compared to my usual side projects.&lt;/p&gt;

&lt;p&gt;Is this faster than just prompting Claude directly? Honestly, probably not for the initial code generation. But debugging was way easier because I could reference the spec when something didn't work. And when I came back to add the manual attendance feature a week later, the spec told me exactly where to hook it in. That's where the structured approach actually saves time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;We've seen the Station Station project: the problem it solves, the architecture, the 8-feature roadmap, and the technical challenges. We have a live application built in 3 days using agent-os SDD.&lt;/p&gt;

&lt;p&gt;But how did we actually build this? How does the agent-os workflow translate requirements into specs, specs into tasks, and tasks into working code? That's what Part 3 will explore in detail.&lt;/p&gt;

&lt;p&gt;In Part 3, we'll walk through the complete agent-os workflow: creating a product, shaping specifications, writing detailed specs, breaking down tasks, and implementing features with AI assistance. We'll use actual Station Station examples to show how each phase works and where human review happens.&lt;/p&gt;

&lt;p&gt;If you're curious how a structured workflow can make AI assistance predictable instead of frustrating, Part 3 is where we dive deep into the mechanics.&lt;/p&gt;

</description>
      <category>specdriven</category>
      <category>agentos</category>
      <category>react</category>
      <category>python</category>
    </item>
    <item>
      <title>Part 1: Spec-Driven Development - Building Predictable AI-Assisted Software</title>
      <dc:creator>Koustubh</dc:creator>
      <pubDate>Tue, 04 Nov 2025 05:40:36 +0000</pubDate>
      <link>https://forem.com/koustubh/part-1-spec-driven-development-building-predictable-ai-assisted-software-19ne</link>
      <guid>https://forem.com/koustubh/part-1-spec-driven-development-building-predictable-ai-assisted-software-19ne</guid>
      <description>&lt;p&gt;You know that feeling when you're using AI to generate code, and it seems to understand exactly what you want? The AI suggests a complete implementation. You accept it, run your tests, and... something's off. So you refine your prompt with more context. The AI generates a different approach. Tests still fail. You add more details to your conversation, but now the context is getting long and the AI starts hallucinating—confidently suggesting functions that don't exist, mixing up variable names, or contradicting its earlier recommendations. Before you know it, you're caught in a trial-and-error loop, spending more time debugging and re-prompting than you would have writing it yourself.&lt;/p&gt;

&lt;p&gt;I've been there. And I've found a better way.&lt;/p&gt;

&lt;p&gt;This is the first part of a series where I'll share how I built Station Station—a personal project for tracking Melbourne train commuters' office attendance—using Spec-Driven Development (SDD). This isn't a promotional piece. I'm going to show you the actual challenges, the times when AI failed spectacularly, and the structured approach that made AI assistance genuinely productive instead of frustrating. By the end of this series, you'll have a clear framework for deciding when SDD makes sense for your projects and when it's overkill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vibe Coding vs Spec-Driven Development
&lt;/h2&gt;

&lt;p&gt;That trial-and-error loop I described? It's often called &lt;strong&gt;"vibe coding"&lt;/strong&gt;—chatting with AI, trying what it suggests, debugging when it breaks, and iterating until something works. No upfront planning, no structure, just vibing with the AI and seeing where it takes you. For quick experiments and throwaway scripts, vibe coding is perfectly fine. But for real projects you want to finish and maintain? The lack of structure becomes a problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spec-Driven Development takes a different approach&lt;/strong&gt;: structure first, then code. Instead of chatting your way to a solution, you invest time upfront to document what you're building, why you're building it, and what success looks like. Then you let AI implement those documented requirements. The AI still does the heavy lifting, but within guardrails you've defined.&lt;/p&gt;

&lt;p&gt;The contrast is striking. Traditional AI chat often becomes a trial-and-error loop—you provide a vague prompt, get generated code, test it, realize it doesn't quite work, and start over. Spec-Driven Development follows a predictable path: gather requirements, write detailed specifications, break into tasks, implement systematically, and produce reviewable output. You spend more time upfront on specs, but save time on debugging and rework.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fkoustubh25%2Fstation-station%2Fmain%2Fagent-os%2Fspecs%2F2025-11-03-technical-blog-sdd%2Fplanning%2Fvisuals%2Fsdd-vs-traditional-comparison.png%3Fv%3D2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fkoustubh25%2Fstation-station%2Fmain%2Fagent-os%2Fspecs%2F2025-11-03-technical-blog-sdd%2Fplanning%2Fvisuals%2Fsdd-vs-traditional-comparison.png%3Fv%3D2" alt="Side-by-side comparison: Traditional AI Chat shows trial-and-error loop with vague prompts leading to repeated attempts until code works. Spec-Driven Development shows linear progression from requirements to specification to implementation to reviewable output" width="1321" height="1508"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Spec-Driven Development?
&lt;/h2&gt;

&lt;p&gt;Spec-Driven Development is exactly what it sounds like: you write detailed specifications before any code gets generated. Instead of throwing vague prompts at an AI and hoping for the best, you invest time upfront to document exactly what you want to build, why you're building it, who it's for, and what success looks like.&lt;/p&gt;

&lt;p&gt;Here's the core insight: AI is incredibly good at implementing well-defined specifications, but it's terrible at mind-reading. When you provide a structured spec with clear requirements, user stories, and acceptance criteria, AI can generate code that actually works. When you give it fuzzy requirements through conversational chat, you get fuzzy results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools for Implementing SDD
&lt;/h2&gt;

&lt;p&gt;There are several tools that help implement Spec-Driven Development workflows. I've experimented with a few, including &lt;strong&gt;&lt;a href="https://github.com/Fission-AI/OpenSpec" rel="noopener noreferrer"&gt;OpenSpec&lt;/a&gt;&lt;/strong&gt;, which takes a different angle on the problem.&lt;/p&gt;

&lt;p&gt;OpenSpec focuses on change proposals for existing systems. Its workflow centers around proposing, reviewing, and implementing changes to established codebases, with support for multiple AI tools through the AGENTS.md convention. It's particularly strong when you're working with a team using different AI assistants or making incremental updates to existing projects.&lt;/p&gt;

&lt;p&gt;For Station Station, I chose &lt;strong&gt;&lt;a href="https://buildermethods.com/agent-os" rel="noopener noreferrer"&gt;agent-os&lt;/a&gt;&lt;/strong&gt;, which is optimized for building complete products from scratch. It handles the full product lifecycle—from initial idea through deployment—with deep Claude integration and sophisticated multi-agent orchestration. The spec format is requirements-based with explicit goals, user stories, and technical implementation details. They've recently introduced &lt;a href="https://buildermethods.com/agent-os/version-2" rel="noopener noreferrer"&gt;version 2&lt;/a&gt; with enhanced capabilities.&lt;/p&gt;

&lt;p&gt;Since I was building a greenfield personal project solo with Claude, agent-os was the natural fit.&lt;/p&gt;

&lt;p&gt;The important thing isn't which SDD tool you use—it's that you use &lt;em&gt;some&lt;/em&gt; structure to align human intent with AI implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent-OS Workflow
&lt;/h2&gt;

&lt;p&gt;For Station Station, the agent-os workflow followed five phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create Product&lt;/strong&gt;: Define your product mission, target users, and core value proposition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shape Spec&lt;/strong&gt;: Gather requirements through structured AI-human dialogue where the AI asks clarifying questions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write Specs&lt;/strong&gt;: Convert those requirements into detailed technical specifications with explicit scope boundaries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write Tasks&lt;/strong&gt;: Break the spec into granular, actionable implementation tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement Tasks&lt;/strong&gt;: AI-assisted coding with human review at key checkpoints&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The magic isn't in any of these individual steps—product managers have been writing specs for decades. The magic is in how this structure channels AI's strengths while keeping humans in control of architecture and quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Station Station Case Study
&lt;/h2&gt;

&lt;p&gt;Over this series, I'll use Station Station as a concrete example of SDD in action. Here's the quick version: I'm a Melbourne train commuter working hybrid, and my company has a 50% office attendance policy. But there's no automated way to track whether I'm meeting that threshold. I have to manually review my Myki (Melbourne's metro card) transaction history and count which days I tapped on at my work station.&lt;/p&gt;

&lt;p&gt;Tedious? Absolutely. Perfect problem for automation? You bet.&lt;/p&gt;

&lt;p&gt;Station Station automatically determines office attendance by analyzing Myki transaction data. If you tapped on/off at your designated work station, it counts as an office day. The app presents monthly statistics, attendance calendars, and lets you export the data for compliance tracking. It's live at &lt;a href="https://koustubh25.github.io/station-station/" rel="noopener noreferrer"&gt;https://koustubh25.github.io/station-station/&lt;/a&gt;. You can use or deploy this tool yourself for free—I'll show you how in the final part of this series.&lt;/p&gt;

&lt;p&gt;Here's what makes it a good SDD case study:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real complexity&lt;/strong&gt;: Bypassing Cloudflare bot detection, reverse-engineering undocumented APIs, handling timezone edge cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8 completed features&lt;/strong&gt;: Built incrementally across 3 phases using the agent-os workflow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Honest challenges&lt;/strong&gt;: There were bugs AI couldn't fix. I'll show you exactly where human intervention was critical.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-world application&lt;/strong&gt;: ~6,300 lines of code, fully responsive React dashboard with Lighthouse score 95+&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This wasn't a toy project. It's a real application I use daily, built with real constraints, encountering real problems. In Part 2, I'll walk through the project in detail—the problem it solves, the technical challenges, and the 8-feature roadmap. In Part 3, we'll dive deep into the agent-os workflow. In Part 4, I'll share the honest truth about where AI failed and why.&lt;/p&gt;

&lt;p&gt;But before we get there, let's set realistic expectations.&lt;/p&gt;

&lt;h2&gt;
  
  
  SDD Is Not Full Automation
&lt;/h2&gt;

&lt;p&gt;Here's the critical thing to understand: Spec-Driven Development is not about handing requirements to an AI and walking away while it builds your app. It's a structured human-AI partnership where you maintain control of architecture and quality while the AI handles implementation and boilerplate.&lt;/p&gt;

&lt;p&gt;Think of it this way: you're the architect and reviewer, the AI is your implementation assistant. You design the system, write the specs, approve the task breakdown, and review the code. The AI generates the boilerplate, implements standard patterns, writes tests, and handles the repetitive work you'd rather not do manually.&lt;/p&gt;

&lt;p&gt;There's a collaboration spectrum, which I'll detail in Part 4:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tier 1 - AI Can Handle Alone&lt;/strong&gt;: Boilerplate generation, standard CRUD operations, test scaffolding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier 2 - AI + Human Review Required&lt;/strong&gt;: Complex business logic, external API integration, cross-file refactoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier 3 - Human Must Lead&lt;/strong&gt;: Debugging multi-layered issues, architectural decisions, security implementations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most real development work spans all three tiers. SDD helps you identify upfront which tier each task falls into, so you know where to invest your review energy and where AI can run autonomously.&lt;/p&gt;

&lt;p&gt;I learned this the hard way. When I added manual attendance date tracking to Station Station, the AI couldn't fix a date-handling bug even after several debugging rounds. I had to review the code myself, identify the specific function where the problem lived, and guide the AI to the solution. The AI was perfectly capable of implementing the fix—once I identified what needed fixing.&lt;/p&gt;

&lt;p&gt;That's the partnership. AI accelerates implementation. Humans guide architecture and troubleshoot the hard problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;In Part 1, we've established what Spec-Driven Development is, why it's more predictable than ad-hoc AI chat, and how it compares to other approaches like OpenSpec. We've introduced Station Station as a real-world case study with honest challenges included.&lt;/p&gt;

&lt;p&gt;In Part 2, we'll explore the Station Station project in detail: the problem it solves, the 8 features I shipped, the technical challenges like Cloudflare bypass and API reverse engineering, and the metrics that prove this approach works in production.&lt;/p&gt;

&lt;p&gt;If you're tired of the AI chat trial-and-error loop, if you want to leverage AI assistance without sacrificing code quality, or if you're just curious how a structured spec can turn AI from unpredictable to reliable—stick around. This series will show you exactly how it works, warts and all.&lt;/p&gt;

</description>
      <category>specdriven</category>
      <category>ai</category>
      <category>productivity</category>
      <category>claude</category>
    </item>
  </channel>
</rss>
