<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ooi Yee Fei</title>
    <description>The latest articles on Forem by Ooi Yee Fei (@yooi).</description>
    <link>https://forem.com/yooi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F652062%2F6be73ecb-aa65-499a-9f7f-5898d92cc2d1.jpeg</url>
      <title>Forem: Ooi Yee Fei</title>
      <link>https://forem.com/yooi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/yooi"/>
    <language>en</language>
    <item>
      <title>Sentrix: An AI SRE Copilot That Debates Its Own Scaling Decisions</title>
      <dc:creator>Ooi Yee Fei</dc:creator>
      <pubDate>Mon, 09 Mar 2026 12:17:32 +0000</pubDate>
      <link>https://forem.com/yooi/sentrix-an-ai-sre-copilot-that-debates-its-own-scaling-decisions-mn2</link>
      <guid>https://forem.com/yooi/sentrix-an-ai-sre-copilot-that-debates-its-own-scaling-decisions-mn2</guid>
      <description>&lt;p&gt;Every SRE team has the same nightmare: it's 3am, traffic spikes, and nobody predicted it. By the time CloudWatch alerts fire, customers are already frustrated and revenue is lost.&lt;/p&gt;

&lt;p&gt;I built Sentrix — an AI-powered SRE copilot that predicts infrastructure problems before they happen and autonomously scales your cloud resources. But what makes it different isn't just prediction — it's debate.&lt;/p&gt;

&lt;p&gt;Three Agents, One Decision&lt;br&gt;
Instead of a single AI making decisions, three Bedrock Claude agents argue about every scaling call:&lt;/p&gt;

&lt;p&gt;AGENT_SRE fights for reliability: "Scale now, we can't risk downtime."&lt;br&gt;
AGENT_FINANCE pushes back on cost: "That's 5x the replicas — do we really need all of them?"&lt;br&gt;
AGENT_ARBITER synthesizes both: "Scale to 3x now, monitor for 5 minutes, then reassess."&lt;br&gt;
The result is decisions that balance reliability and cost — and every decision is scored 5 minutes later via a Step Functions feedback loop. The scores become thought signatures that feed back into future analysis. The AI literally learns what works for your specific infrastructure.&lt;/p&gt;

&lt;p&gt;What it looks like in practice&lt;br&gt;
I ran Sentrix through a full incident lifecycle: traffic spike → cost optimization → regional degradation → cascading AWS failure → cross-cloud GCP failover → autonomous recovery.&lt;/p&gt;

&lt;p&gt;Watch the demo: &lt;br&gt;


  &lt;iframe src="https://www.youtube.com/embed/__i2HT7O2Ik"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;During a 4536% traffic surge, the brain detected it in milliseconds and scaled EKS from 2 to 10 pods — no human needed. When traffic normalized, the Finance agent argued for scaling down, and the system optimized from 10 back to 5 pods. When AWS regions cascaded, all three agents unanimously agreed on GCP failover. The feedback loop scored that decision 100/100.&lt;/p&gt;

&lt;p&gt;The whole system runs on a single AWS CDK stack — Lambda, Bedrock, EKS, DynamoDB, Step Functions, EventBridge, CloudFront — deployed in 5 minutes.&lt;/p&gt;

&lt;p&gt;Full writeup&lt;br&gt;
The full post covers the architecture, severity-based model selection (Haiku for low severity, Sonnet for critical), the thought signature self-evolution mechanism, and a phase-by-phase demo walkthrough with screenshots.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://yeefei.beehiiv.com/p/sentrix-an-ai-sre-copilot-that-debates-its-own-scaling-decisions?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=sentrix" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Read the full writeup on Build Signals&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;I submitted Sentrix to the AWS 10,000 AIdeas competition. The top 300 most-liked articles advance to the next round. If you found this interesting, a like on the article would genuinely help — it takes 2 seconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://builder.aws.com/content/3AffxhsUPRlHNn5kgUfqn2PmY40/aideas-sentrix-ai-powered-sre-copilot" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Like the article on AWS Builder Center&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>sre</category>
      <category>programming</category>
      <category>devops</category>
    </item>
    <item>
      <title>Your AI Chief Product Officer: Claude Code Skill for Solo Builders</title>
      <dc:creator>Ooi Yee Fei</dc:creator>
      <pubDate>Tue, 10 Feb 2026 15:37:17 +0000</pubDate>
      <link>https://forem.com/yooi/your-ai-chief-product-officer-claude-code-skill-for-solo-builders-33o6</link>
      <guid>https://forem.com/yooi/your-ai-chief-product-officer-claude-code-skill-for-solo-builders-33o6</guid>
      <description>&lt;p&gt;Building alone means wearing every hat — including product manager. I keep hitting the same walls: competitor research is exhausting, feature prioritization feels like guesswork, and I don't have a PM team to lean on.&lt;/p&gt;

&lt;p&gt;So I built a Product Management skill for Claude Code. It handles the PM workflow I was doing manually — researching competitors, scoring feature gaps, generating PRDs, and creating GitHub Issues — all from the terminal.&lt;/p&gt;

&lt;p&gt;The key piece is a &lt;strong&gt;WINNING filter&lt;/strong&gt; that scores every feature gap on pain, timing, execution capability, and defensibility. It takes 50+ potential features and narrows them to 3–5 high-conviction priorities. No more building something just because "it would be cool."&lt;/p&gt;

&lt;p&gt;It also pairs with &lt;a href="https://github.com/github/spec-kit" rel="noopener noreferrer"&gt;spec-kit&lt;/a&gt; for implementation handoff — the PRD creates a GitHub Issue, and spec-kit picks it up from there. One workflow, no context switching.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it looks like in practice
&lt;/h2&gt;

&lt;p&gt;I ran &lt;code&gt;/pm:prd&lt;/code&gt; on a new feature module for one of my products. Claude analyzed the codebase, identified reusable components (approval workflows, audit trails, role-based access), and generated a full PRD with user stories, functional requirements, and architecture reuse analysis.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwg9r0br2unikj0ekxlxo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwg9r0br2unikj0ekxlxo.png" alt="PRD output showing user stories by priority, functional requirements, and quality validation"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The PRD was saved locally and created as a GitHub Issue automatically — with MVP scope, P0/P1/P2 priorities, and a breakdown of what I could reuse vs. build new.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0p05voeh2inngwem0q7j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0p05voeh2inngwem0q7j.png" alt="PRD complete showing GitHub issue link, priority, and architecture reuse percentages"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From "I want to add this feature" to a structured GitHub Issue with a technical spec ready to generate — in one session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full writeup
&lt;/h2&gt;

&lt;p&gt;The full post covers all the commands (&lt;code&gt;/pm:analyze&lt;/code&gt;, &lt;code&gt;/pm:landscape&lt;/code&gt;, &lt;code&gt;/pm:gaps&lt;/code&gt;, &lt;code&gt;/pm:prd&lt;/code&gt;), the WINNING filter scoring breakdown, the spec-kit integration, data storage, and a step-by-step walkthrough with screenshots.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://yeefei.beehiiv.com/p/your-ai-chief-product-officer-claude-code-skill-for-solo-builders-298c?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=pm-skill" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Read the full writeup on Build Signals&lt;/a&gt;
&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/ooiyeefei/ccc/tree/main/plugins/product-management" rel="noopener noreferrer"&gt;GitHub (Product Management Plugin)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ooiyeefei/ccc" rel="noopener noreferrer"&gt;GitHub (ccc Plugin Collection)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/github/spec-kit" rel="noopener noreferrer"&gt;spec-kit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://yeefei.beehiiv.com/p/your-ai-chief-product-officer-claude-code-skill-for-solo-builders-298c?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=pm-skill" rel="noopener noreferrer"&gt;Build Signals Newsletter&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>claudecode</category>
      <category>coding</category>
    </item>
    <item>
      <title>Starting my own newsletter — Build Signals</title>
      <dc:creator>Ooi Yee Fei</dc:creator>
      <pubDate>Mon, 09 Feb 2026 08:30:10 +0000</pubDate>
      <link>https://forem.com/yooi/starting-my-own-newsletter-build-signals-lh9</link>
      <guid>https://forem.com/yooi/starting-my-own-newsletter-build-signals-lh9</guid>
      <description>&lt;p&gt;I've been writing about Claude Code skills, LLM dev workflows, and lessons from building products solo. Thanks to everyone who subscribed on Medium — I appreciate you following along, especially now early on when I was just figuring things out. Going forward, I'll be publishing on my own newsletter — &lt;strong&gt;Build Signals&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Same content, just a more direct way to stay connected. I'll still post on Medium, but full writeups will go out on Build Signals first.&lt;/p&gt;

&lt;p&gt;If you'd like to follow along:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://yeefei.beehiiv.com/subscribe" class="ltag_cta ltag_cta--branded" rel="noopener noreferrer"&gt;Subscribe to Build Signals&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devjournal</category>
      <category>llm</category>
      <category>writing</category>
    </item>
    <item>
      <title>[AI-Powered ASL Communication App] - Part 2: Custom ASL Model on EKS + Claude Facilitation via Context Engineering</title>
      <dc:creator>Ooi Yee Fei</dc:creator>
      <pubDate>Sat, 31 Jan 2026 15:41:24 +0000</pubDate>
      <link>https://forem.com/aws-builders/bridging-two-worlds-custom-asl-model-on-eks-claude-facilitation-via-context-engineering-1ebd</link>
      <guid>https://forem.com/aws-builders/bridging-two-worlds-custom-asl-model-on-eks-claude-facilitation-via-context-engineering-1ebd</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/aws-builders/ai-powered-asl-communication-app-part-1-cost-effective-sign-detection-model-training-on-aws-eks-2ca0"&gt;Part 1&lt;/a&gt; covered training an ASL recognition model on EKS. This part focuses on deploying that model for inference and designing a real-world conversation flow using Bedrock Claude with context engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;A deaf person and a hearing person want to have a conversation. No interpreter available.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deaf person signs → hearing person needs audio&lt;/li&gt;
&lt;li&gt;Hearing person speaks → deaf person needs text&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Seems straightforward - just translate between modalities. But there's a deeper challenge most people miss:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;ASL isn't English in sign form.&lt;/strong&gt; Grammar, word order, and expression differ fundamentally. Direct translation doesn't work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ML models aren't perfect.&lt;/strong&gt; Our model is 65% accurate. Users need guidance when detection fails.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vocabulary is limited.&lt;/strong&gt; 100 signs vs thousands of English words. Users need help staying within what the system understands.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is where AI facilitation matters. Claude doesn't just translate - it understands conversation context, suggests responses within vocabulary constraints, and keeps the dialogue flowing even when the recognition model stumbles. It bridges the gap between imperfect ML and usable conversation.&lt;/p&gt;

&lt;p&gt;The app runs in a browser. No downloads, no accounts.&lt;/p&gt;

&lt;p&gt;Some sample demo images, more detailed in below discussion:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpsg0e8i7b7tm4nl4ixd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpsg0e8i7b7tm4nl4ixd.png" alt=" " width="800" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft79zb2khzrj97tts88i5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft79zb2khzrj97tts88i5.png" alt=" " width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────────────────────────────────────────────────────┐
│                            Browser (React)                               │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────┐  ┌──────────────┐  │
│  │   Webcam     │  │  MediaPipe   │  │  Speech     │  │ Conversation │  │
│  │   Feed       │──│  Hand Track  │  │  Recognition│  │ View + TTS   │  │
│  └──────────────┘  └──────┬───────┘  └──────┬──────┘  └──────────────┘  │
└────────────────────────────┼────────────────┼────────────────────────────┘
                             │                │
                             │ landmarks      │ text (browser API)
                             ▼                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      CloudFront + Lambda                            │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  /v1/asl/predict  →  EKS Inference (PoseLSTM)                 │  │
│  │  /v1/suggestions  →  Bedrock Claude (context-aware prompts)  │  │
│  └──────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Flow:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deaf → Hearing:&lt;/strong&gt; Webcam → MediaPipe → EKS model → Text → Browser TTS (audio)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hearing → Deaf:&lt;/strong&gt; Microphone → Browser Speech Recognition → Text display&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deploying the Inference Endpoint
&lt;/h2&gt;

&lt;h3&gt;
  
  
  EKS Setup for Inference
&lt;/h3&gt;

&lt;p&gt;Training used g6.12xlarge (4 L4 GPUs). For inference, that's overkill. The model is 2M params - runs fine on CPU.&lt;/p&gt;

&lt;p&gt;Key decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CPU inference&lt;/strong&gt; - Model is ~2M params, no GPU needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal resources&lt;/strong&gt; - 256Mi memory, 100m CPU request&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FastAPI server&lt;/strong&gt; - Async-friendly, good for ML serving&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Inference API
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Input: 32 frames of hand landmarks (126 features per frame)&lt;/li&gt;
&lt;li&gt;Output: predicted sign, confidence score, top-5 predictions&lt;/li&gt;
&lt;li&gt;Latency: &lt;strong&gt;~40ms&lt;/strong&gt; per prediction&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why Lambda + CloudFront (Not Direct EKS)
&lt;/h3&gt;

&lt;p&gt;Lambda as a proxy instead of exposing EKS directly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt; - EKS stays in private subnet, no public exposure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caching&lt;/strong&gt; - CloudFront caches repeated requests (same sign = same response)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt; - Lambda scales to zero, only pay for actual requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth&lt;/strong&gt; - CloudFront + OAC handles authentication without custom code
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User → CloudFront → Lambda → EKS (private)
         ↓
      (caching)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conversation Facilitation: The Design Challenge
&lt;/h2&gt;

&lt;p&gt;Sign recognition is solved. The harder problem: making the conversation actually work.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Communication Gap
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ASL is not English&lt;/strong&gt; - Grammar, word order, and concepts differ&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vocabulary mismatch&lt;/strong&gt; - Our model knows 100 signs. Real conversations need more.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context matters&lt;/strong&gt; - "BOOK" could mean "I want a book" or "I'm reading a book" depending on context&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Solution: Context Engineering with Bedrock Claude
&lt;/h3&gt;

&lt;p&gt;Claude handles conversation facilitation - not translation, but generating contextually relevant response suggestions.&lt;/p&gt;

&lt;p&gt;Key requirement: Claude needs to know not just what was said, but &lt;em&gt;who&lt;/em&gt; said it and &lt;em&gt;how&lt;/em&gt; they communicate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// suggestionService.ts - the actual implementation&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ConversationTurn&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;asl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;voice&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// Who said it and how&lt;/span&gt;
  &lt;span class="nl"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;            &lt;span class="c1"&gt;// What was communicated&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getLLMSuggestions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;conversationHistory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ConversationTurn&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
  &lt;span class="nx"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;asl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;voice&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildSuggestionPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;conversationHistory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;suggestions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callClaudeBedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;suggestions&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Context Engineering: The Four Key Decisions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Sliding Window (Last 6 Turns)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We don't send the entire conversation history. Just the last 6 turns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;conversationContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;history&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// Only recent context&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`[&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;asl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Deaf user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Hearing user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;]: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why 6? Enough context to understand the topic, not so much that it confuses the model or wastes tokens. Most conversations have natural topic shifts every 4-6 exchanges anyway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Mode-Aware Prompting&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The same conversation needs different suggestions depending on who's responding next:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;modeContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;isASLMode&lt;/span&gt;
  &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;`The deaf user will respond using ASL signs. Suggest common ASL signs from the WLASL-100 vocabulary.
Available signs include: HELLO, HELP, YES, NO, THANK-YOU, PLEASE, GOOD, WANT, NEED, LIKE, GO, EAT, DRINK...
Keep suggestions to single words or short phrases that are actual ASL signs.`&lt;/span&gt;
  &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`The hearing user will respond by speaking. Suggest natural, conversational responses.
Keep suggestions brief (under 8 words each) and appropriate for the context.`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is critical. ASL suggestions must be constrained to signs the model can actually recognize. Voice suggestions can be natural language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Turn Type Labeling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each message is tagged with who sent it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// In the prompt, turns look like:&lt;/span&gt;
&lt;span class="c1"&gt;// [Deaf user (ASL)]: HELLO&lt;/span&gt;
&lt;span class="c1"&gt;// [Hearing user (Voice)]: Hi! How are you?&lt;/span&gt;
&lt;span class="c1"&gt;// [Deaf user (ASL)]: GOOD&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude can track the conversation flow and understand the back-and-forth pattern. This helps it generate appropriate responses for each party.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Response Caching&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Same context = same suggestions. No need to hit the API twice.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;suggestionCache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;suggestions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt; &lt;span class="nl"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CACHE_TTL_MS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// 30 seconds&lt;/span&gt;

&lt;span class="c1"&gt;// Cache key is the full context hash&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cacheKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;conversationHistory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;|&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;30-second TTL means rapid back-and-forth doesn't hammer the API, but suggestions stay fresh as the conversation evolves.&lt;/p&gt;

&lt;h3&gt;
  
  
  Additional Design Decisions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Graceful fallbacks&lt;/strong&gt; - If API fails, default suggestions like &lt;code&gt;['YES', 'NO', 'HELP', 'THANK-YOU']&lt;/code&gt; are shown&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vocabulary display&lt;/strong&gt; - UI shows all 100 supported signs so users know what's possible&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON output format&lt;/strong&gt; - Claude returns suggestions as JSON array for reliable parsing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-Time Flow
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Deaf user signs → Hearing user hears:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Webcam captures video stream
   ↓
2. MediaPipe extracts hand landmarks (client-side, ~10ms)
   ↓
3. Collect 32 frames of landmarks (~1 second of signing)
   ↓
4. Send to /v1/asl/predict (EKS)
   ↓
5. Model returns prediction + confidence (~40ms)
   ↓
6. If confidence ≥ 60%: show confirmation dialog (threshold balances false positives vs missed detections)
   ↓
7. User confirms → sign added to conversation
   ↓
8. Browser TTS speaks the sign to hearing user
   ↓
9. Bedrock generates suggestions for next response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frw7r29n6tpvvtmhz2z52.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frw7r29n6tpvvtmhz2z52.png" alt=" " width="800" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hearing user speaks → Deaf user reads:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Browser Speech Recognition API captures audio
   ↓
2. Real-time transcription (browser-native)
   ↓
3. User confirms → text added to conversation
   ↓
4. Deaf user reads the message
   ↓
5. Bedrock generates ASL sign suggestions (vocabulary-constrained)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6xvnpfd4qpo8tzfao6qn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6xvnpfd4qpo8tzfao6qn.png" alt=" " width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fifhyoinbyjvixurzg4pj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fifhyoinbyjvixurzg4pj.png" alt=" " width="800" height="855"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggjkf14tv4v4l8kft1wl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fggjkf14tv4v4l8kft1wl.png" alt=" " width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Total latency: &lt;strong&gt;~500ms&lt;/strong&gt; for ASL detection flow. Speech recognition is near-instant (browser-native).&lt;/p&gt;

&lt;h2&gt;
  
  
  How This Differs from Other ASL Apps
&lt;/h2&gt;

&lt;p&gt;Most ASL apps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Teaching ASL (educational)&lt;/li&gt;
&lt;li&gt;Translating ASL to text (one-way)&lt;/li&gt;
&lt;li&gt;Avatar-based signing (uncanny valley)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;VoxSign:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bidirectional (both parties can initiate)&lt;/li&gt;
&lt;li&gt;Context-aware (Claude tracks conversation flow)&lt;/li&gt;
&lt;li&gt;Confirmation-based (handles model uncertainty)&lt;/li&gt;
&lt;li&gt;Suggestion-driven (guides users within vocabulary limits)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Unsolved Challenges
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;100 signs isn't enough&lt;/strong&gt; - Real conversations need 500+ signs minimum&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two-hand signs are harder&lt;/strong&gt; - Model struggles with signs requiring hand interaction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No sentence-level understanding&lt;/strong&gt; - We detect individual signs, not full ASL sentences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-way ASL&lt;/strong&gt; - Deaf user reads text, doesn't see ASL video (video generation was too slow)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context engineering &amp;gt; prompt engineering&lt;/strong&gt; - Deciding &lt;em&gt;what&lt;/em&gt; to send Claude (last 6 turns, turn types, mode context) matters more than prompt wording. The sliding window was a bigger win than any prompt change.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mode-aware prompting is essential&lt;/strong&gt; - Same conversation, different constraints. ASL suggestions must be vocabulary-constrained; voice suggestions can be natural language.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Confirmation UX handles model uncertainty&lt;/strong&gt; - User confirms predictions before adding to conversation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cache aggressively&lt;/strong&gt; - Same context = same suggestions. 30-second TTL saved API costs and improved latency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bedrock is cost-effective&lt;/strong&gt; - ~$0.003 per conversation turn. Context engineering to reduce token count pays off directly.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>aws</category>
      <category>webdev</category>
    </item>
    <item>
      <title>ClaudeCode Streak Skill Got a Telegram Bot: Check In From Your Phone</title>
      <dc:creator>Ooi Yee Fei</dc:creator>
      <pubDate>Mon, 05 Jan 2026 23:58:56 +0000</pubDate>
      <link>https://forem.com/yooi/claudecode-streak-skill-got-a-telegram-bot-check-in-from-your-phone-3k8n</link>
      <guid>https://forem.com/yooi/claudecode-streak-skill-got-a-telegram-bot-check-in-from-your-phone-3k8n</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a follow-up to my &lt;a href="https://dev.to/yooi/beyond-coding-your-accountability-buddy-with-claude-code-skill-4omh"&gt;first Streak post&lt;/a&gt; where I introduced the Claude Code skill for tracking personal challenges.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;A few weeks ago, I shared the &lt;a href="https://github.com/ooiyeefei/ccc/tree/main/skills/streak" rel="noopener noreferrer"&gt;Streak skill&lt;/a&gt; - a Claude Code skill for tracking any personal challenge. The response was better than I expected. People actually found it useful. I saw some started to try it out and raise for features.&lt;/p&gt;

&lt;p&gt;More importantly, I just completed my first 30-day challenge using it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What 30 Days of Tracking Taught Me
&lt;/h2&gt;

&lt;p&gt;After the challenge ended, I asked Claude Code to do a retrospective analysis. Not just "here's what you did" - but a comprehensive review of patterns, habits, what worked, what didn't.&lt;/p&gt;

&lt;p&gt;The insights were... humbling.&lt;/p&gt;

&lt;p&gt;Claude actually analyzed thoroughly and proactively surfaced many aspects of retrospective insights - without me prompting or guiding it through. It covered things I built, impact and wins to measure performance and results over time (Win Rate even! Claude was brutal on that one). It also automatically did Tech Stack Analysis without me asking, identifying patterns of tools I use, projects I built, things I left in backlog lists dusted but forgot, and Emerging Patterns that were new and surprising to me - stuff I picked up outside my comfort zone that I hadn't consciously noticed.&lt;/p&gt;

&lt;p&gt;There was a Top 10 Learnings section ranked by impact, work-related insights like product decisions, feature prioritization, inspiration and new ideas that cross-influenced from my builds. And of course, What Worked, What Didn't Work, and Recommendations for Next Steps.&lt;/p&gt;

&lt;p&gt;All this from the data I'd been logging daily. For someone who's the least planned and structured person, having Claude surface these patterns was eye-opening.&lt;/p&gt;

&lt;p&gt;My experiment has been on technical work I did, but I'm excited to double down on effort with more aspects of life - fitness, habits, learning - and do more cross-analysis to understand how each area impacts and affects the others.&lt;/p&gt;

&lt;p&gt;But... there was a small challenge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Friction
&lt;/h2&gt;

&lt;p&gt;Here's what started bothering me after week 2.&lt;/p&gt;

&lt;p&gt;Some days got busy. Really busy. And even with the best intentions, I'd forget to check in. Not because I didn't want to - but because the friction was too high.&lt;/p&gt;

&lt;p&gt;To log a check-in, I had to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open terminal&lt;/li&gt;
&lt;li&gt;Navigate to the right directory&lt;/li&gt;
&lt;li&gt;Start Claude Code&lt;/li&gt;
&lt;li&gt;Remember the command&lt;/li&gt;
&lt;li&gt;Actually do the check-in&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's 4 steps before the actual work. On a hectic day, steps 1–4 just… didn't happen. And those  I have to track back and log to fill up the days , which was okay and I still managed to do all 30-days logging in the end, but is not ideal.&lt;/p&gt;

&lt;p&gt;I had a calendar reminder option, but that just reminded me to open terminal. The friction was still there.&lt;/p&gt;

&lt;p&gt;I wanted something that could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ping me when a check-in was due&lt;/li&gt;
&lt;li&gt;Let me check in right there, without opening anything else&lt;/li&gt;
&lt;li&gt;Work from my phone (because that's always in my hand)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Exploring Options
&lt;/h2&gt;

&lt;p&gt;I started thinking about chat-based notifications. Something that lives where I already am:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Slack&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Already use it for work&lt;/td&gt;
&lt;td&gt;Paid for full API, mixes with work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WhatsApp&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Everyone has it&lt;/td&gt;
&lt;td&gt;Business API is complex, costs money&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Discord&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free, good bot support&lt;/td&gt;
&lt;td&gt;Don't use it daily&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Telegram&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free, excellent bot API&lt;/td&gt;
&lt;td&gt;Need to install app&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Telegram won. Free API, easy bot creation, works on all devices, and the bot API is surprisingly good.&lt;/p&gt;

&lt;p&gt;So I built it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Added: Claude Code Streak Skill Telegram Bot
&lt;/h2&gt;

&lt;p&gt;The bot does everything the Claude Code skill does - but from your phone or laptop chat app:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjtebhk1ef5e6y46aafx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjtebhk1ef5e6y46aafx.png" alt=" " width="800" height="243"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy8punjvxbjt64gcymllv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy8punjvxbjt64gcymllv.png" alt=" " width="800" height="332"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────┐
│           @mystreak_bot                     │
│                                             │
│  /list    → See all your challenges         │
│  /switch  → Change active challenge         │
│  /streak  → Interactive check-in            │
│  /stats   → View progress &amp;amp; streaks         │
│  /insights → Cross-challenge patterns       │
│  /new     → Create new challenge            │
└─────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What It Looks Like
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Main Menu:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Welcome to Streak Bot!

Your active challenge: morning-workout
Status: 5-day streak

[Check In]  [List Challenges]
[Stats]     [Insights]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Check-in Flow:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Check In: morning-workout

How did it go today?

[Great]  [Good]  [Okay]  [Struggled]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What did you work on?
&amp;gt; Push day - bench press, overhead press, tricep dips

Any notes or learnings?
&amp;gt; Finally hit 60kg bench! Form felt solid.

Session 15 logged!
Current streak: 6 days
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;List Challenges:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your Challenges:

ACTIVE:
* morning-workout (Fitness) - 6 day streak
  learn-rust (Learning) - 3 day streak
  read-12-books (Learning) - due today

PAUSED:
  meditation-habit (Habit) - paused 2 weeks ago

Use /switch [name] to change active challenge
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  One Bot, All Challenges
&lt;/h2&gt;

&lt;p&gt;Here's an important design decision: &lt;strong&gt;one bot manages ALL your challenges&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not one bot per challenge. Not separate bots for fitness vs learning vs work.&lt;/p&gt;

&lt;p&gt;Why? Because your challenges are interconnected.&lt;/p&gt;

&lt;p&gt;Your morning workout affects your coding productivity. Your learning enables your building. Your meditation habit influences your creative work.&lt;/p&gt;

&lt;p&gt;By keeping everything in one place, the bot can detect patterns across challenges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cross-Challenge Insight:

Your "morning-workout" sessions correlate with
higher productivity in "learn-rust".

Sessions where you worked out first show 40%
more concepts covered.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the whole point of Streak - not just tracking individual things, but understanding how they connect.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works: Both Stay in Sync
&lt;/h2&gt;

&lt;p&gt;The Telegram bot and Claude Code skill read/write the same files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────┐
│                    .streak/ folder                           │
│                   (Source of Truth)                          │
└─────────────────────┬───────────────────────┬───────────────┘
                      │                       │
                 reads/writes            reads/writes
                      │                       │
                      ▼                       ▼
        ┌─────────────────────┐   ┌─────────────────────┐
        │   Claude Code       │   │   Telegram Bot      │
        │   (Terminal)        │   │   (Phone)           │
        │                     │   │                     │
        │   Deep work:        │   │   Quick check-ins:  │
        │   - Planning        │   │   - Log progress    │
        │   - Research        │   │   - View stats      │
        │   - Analysis        │   │   - Switch context  │
        └─────────────────────┘   └─────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check in from Telegram in the morning. Do deep analysis in Claude Code later. They see the same data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optional GitHub sync:&lt;/strong&gt; If you want to access from multiple devices, commit your &lt;code&gt;.streak/&lt;/code&gt; folder and enable git sync. The bot can auto-pull before reading and auto-push after check-ins.&lt;/p&gt;

&lt;h2&gt;
  
  
  Push Notifications
&lt;/h2&gt;

&lt;p&gt;This is the feature that I wanted to test out&lt;/p&gt;

&lt;p&gt;The bot doesn't just wait for you to message it - it &lt;strong&gt;proactively pings you&lt;/strong&gt; when challenges are due:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🔔 Streak Check-in Reminder

❗ Overdue:
• morning-workout (2d overdue)

📅 Due Today:
• learn-rust (streak: 5 days)
• read-24-books (weekly check-in)

Tap /streak to check in

[✓ Check In Now]  [📋 List All]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No more forgetting. No more "I'll do it later." The notification lands on your phone at your configured time (default 9 AM), and you can check in right there with two taps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configure your notification time:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In your .env file&lt;/span&gt;
&lt;span class="nv"&gt;TIMEZONE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Asia/Singapore    &lt;span class="c"&gt;# Your timezone&lt;/span&gt;
&lt;span class="nv"&gt;NOTIFICATION_HOUR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;9        &lt;span class="c"&gt;# 9 AM&lt;/span&gt;
&lt;span class="nv"&gt;NOTIFICATION_MINUTE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0      &lt;span class="c"&gt;# On the hour&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the friction-killer I was looking for. Bot runs 24/7 in Docker, pings me every morning, I tap "Check In Now", done in 30 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup: Easier Than You Think
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites (5 minutes, manual by you, one-time)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Create a Telegram bot:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open Telegram, message &lt;code&gt;@BotFather&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Send &lt;code&gt;/newbot&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Pick a name: &lt;code&gt;My Streak Bot&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Pick a username: &lt;code&gt;mystreak_bot&lt;/code&gt; (must end in &lt;code&gt;bot&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Save the token it gives you&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Get your chat ID:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Message &lt;code&gt;@userinfobot&lt;/code&gt; on Telegram&lt;/li&gt;
&lt;li&gt;Save the number it returns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Message your bot:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Find your bot by username&lt;/li&gt;
&lt;li&gt;Send &lt;code&gt;/start&lt;/code&gt; (required before it can message you)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Create &lt;code&gt;.env&lt;/code&gt; file&lt;/strong&gt; in your project folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .env &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
TELEGRAM_BOT_TOKEN=your-token-here
ALLOWED_USERS=your-chat-id-here
TIMEZONE=Asia/Singapore
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deploy Automatically(One Command)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/streak-telegram
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verifies your credentials&lt;/li&gt;
&lt;li&gt;Copies the bot files&lt;/li&gt;
&lt;li&gt;Adds &lt;code&gt;.env&lt;/code&gt; to &lt;code&gt;.gitignore&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Starts Docker container&lt;/li&gt;
&lt;li&gt;Shows you management commands&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bot runs in the background and auto-restarts on reboot.&lt;/p&gt;

&lt;h3&gt;
  
  
  Management
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose logs &lt;span class="nt"&gt;-f&lt;/span&gt;      &lt;span class="c"&gt;# View logs&lt;/span&gt;
docker-compose restart      &lt;span class="c"&gt;# Restart bot&lt;/span&gt;
docker-compose down         &lt;span class="c"&gt;# Stop bot&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real Examples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Example 1: Sarah's Reading Challenge
&lt;/h3&gt;

&lt;p&gt;Sarah wants to read 24 books this year. She created a Learning challenge:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Challenge: read-24-books
Type: Learning
Goal: 2 books per month
Cadence: Weekly check-in
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Her typical week in Telegram:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sunday evening:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/streak

How's the reading going?

&amp;gt; Finished "Atomic Habits" - great stuff on habit
&amp;gt; stacking. Started "Deep Work" yesterday.

Progress on milestone?
&amp;gt; 4/24 books done

Session 8 logged!
Current streak: 8 weeks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Mid-week reminder (from bot):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Streak Reminder

Your challenge "read-24-books" is due for
check-in in 2 days.

Current streak: 8 weeks

[Check In Now]  [Remind Later]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 2: Mike's Fitness Journey
&lt;/h3&gt;

&lt;p&gt;Mike's tracking a strength training habit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Challenge: strength-training
Type: Fitness
Goal: Consistent 4x/week lifting
Cadence: Daily
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Morning check-in:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/streak

What workout today?
&amp;gt; Pull day - deadlifts, rows, pull-ups, bicep curls

How did your body feel?
&amp;gt; Good energy. Deadlift felt heavy but form was solid.
&amp;gt; Hit 100kg for 3 reps - new PR!

Any notes?
&amp;gt; Need more sleep. Was yawning between sets.

Session logged! PR detected - nice work!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Stats view:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/stats

strength-training Statistics

Sessions: 23
Current Streak: 12 days
Longest Streak: 12 days (current!)
Average: 4.2 sessions/week

Best Days: Tuesday, Thursday, Saturday
Common Notes: "energy", "form", "sleep"

Achievements:
  First Flame (3-day streak)
  On Fire (7-day streak)
  Dedicated (10 sessions)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 3: Cross-Challenge Insights
&lt;/h3&gt;

&lt;p&gt;Alex runs three challenges: morning-meditation (Habit), learn-python (Learning), and side-project (Building).&lt;/p&gt;

&lt;p&gt;After a month, the &lt;code&gt;/insights&lt;/code&gt; command surfaces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cross-Challenge Patterns:

1. Meditation → Productivity Link
   Days with morning meditation show 35% longer
   coding sessions in both learn-python and
   side-project.

2. Skill Transfer Detected
   Your "learn-python" Session 12 (learned async/await)
   directly enabled "side-project" Session 8
   (built async API client).

3. Best Sequence
   Your most productive days follow this pattern:
   meditation → workout → coding
   Consider making this your default routine.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  My New Workflow
&lt;/h2&gt;

&lt;p&gt;Here's how I use both together now:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Morning (phone):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wake up, check Telegram&lt;/li&gt;
&lt;li&gt;Quick &lt;code&gt;/streak&lt;/code&gt; check-in for morning routine&lt;/li&gt;
&lt;li&gt;See what's due today&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;During work (terminal):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code for deep work sessions&lt;/li&gt;
&lt;li&gt;Research, planning, analysis&lt;/li&gt;
&lt;li&gt;Detailed check-ins with context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Evening (phone):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quick log of what got done&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/stats&lt;/code&gt; to see progress&lt;/li&gt;
&lt;li&gt;Plan tomorrow's focus&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The friction is gone. I haven't missed a check-in in 2 weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;

&lt;p&gt;If you're already using Streak:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Update the skill&lt;/span&gt;
/plugin update ccc-skills@ccc

&lt;span class="c"&gt;# Set up Telegram (5 min)&lt;/span&gt;
&lt;span class="c"&gt;# 1. @BotFather → /newbot → save token&lt;/span&gt;
&lt;span class="c"&gt;# 2. @userinfobot → save chat ID&lt;/span&gt;
&lt;span class="c"&gt;# 3. Create .env file&lt;/span&gt;

&lt;span class="c"&gt;# Deploy&lt;/span&gt;
/streak-telegram
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're new to Streak:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
/plugin marketplace add ooiyeefei/ccc
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;ccc-skills@ccc

&lt;span class="c"&gt;# Create your first challenge&lt;/span&gt;
/streak-new

&lt;span class="c"&gt;# Add Telegram (optional but recommended)&lt;/span&gt;
/streak-telegram
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;A few things I'm thinking about:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Notification timing&lt;/strong&gt; - Smart reminders based on your check-in patterns, not just fixed schedules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice check-ins&lt;/strong&gt; - Telegram supports voice messages. Could be useful for quick logs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weekly digests&lt;/strong&gt; - Automated summary sent every Sunday&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-device sync&lt;/strong&gt; - Better GitHub integration for teams or multiple devices&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But for now, the basics work. And that's enough.&lt;/p&gt;




&lt;p&gt;The goal was simple: reduce friction so much that checking in becomes automatic.&lt;/p&gt;

&lt;p&gt;Telegram on my phone. Always there. One tap to log progress. No terminal, no directory navigation, no startup time.&lt;/p&gt;

&lt;p&gt;For someone as unstructured as me, that's the difference between a system that sticks and another forgotten tool.&lt;/p&gt;

&lt;p&gt;Give it a try. Let me know what breaks - or what features you'd love to see!&lt;/p&gt;

&lt;p&gt;Got ideas? Found a bug? Want a feature that would make this work better for your use case? &lt;strong&gt;&lt;a href="https://github.com/ooiyeefei/ccc/issues" rel="noopener noreferrer"&gt;Open an issue on GitHub&lt;/a&gt;&lt;/strong&gt; - I'd love to hear what challenge types you're tracking and how the tool can better support them. The best features come from real users with real needs.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/ooiyeefei/ccc/tree/main/skills/streak" rel="noopener noreferrer"&gt;GitHub (Streak Skill)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ooiyeefei/ccc" rel="noopener noreferrer"&gt;GitHub (ccc Plugin Collection)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ooiyeefei/ccc/issues" rel="noopener noreferrer"&gt;File Issues / Feature Requests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/yooi/beyond-coding-your-accountability-buddy-with-claude-code-skill-4omh"&gt;First Streak Post&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>productivity</category>
      <category>automation</category>
    </item>
    <item>
      <title>[AI-Powered ASL Communication App] - Part 1: Cost-Effective Sign Detection Model Training on AWS EKS</title>
      <dc:creator>Ooi Yee Fei</dc:creator>
      <pubDate>Sat, 03 Jan 2026 23:47:40 +0000</pubDate>
      <link>https://forem.com/aws-builders/ai-powered-asl-communication-app-part-1-cost-effective-sign-detection-model-training-on-aws-eks-2ca0</link>
      <guid>https://forem.com/aws-builders/ai-powered-asl-communication-app-part-1-cost-effective-sign-detection-model-training-on-aws-eks-2ca0</guid>
      <description>&lt;p&gt;Recently I had an idea for an app that helps with ASL sign communication - wanted to experiment and see if it's feasible. But first, I need a model that can detect signs well enough. I'm not an ML scientist - more of an AI engineer. So ML training is something I'm still learning.&lt;/p&gt;

&lt;p&gt;I started by researching existing models. Couldn't find a ready-to-use one that works for my use case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://multilingual.com/google-signgemma-on-device-asl-translation/" rel="noopener noreferrer"&gt;Google SignGemma&lt;/a&gt; - Not released to public yet&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://signllm.github.io/" rel="noopener noreferrer"&gt;SignLLM&lt;/a&gt; - Interesting approach but designed for a different workflow (sign-to-text translation vs real-time detection)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most existing solutions focus on fingerspelling (alphabet) rather than word-level signs. So I explored training my own model with an ASL dataset.&lt;/p&gt;

&lt;p&gt;Initial results:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmbox0exjow61dvugelh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmbox0exjow61dvugelh.png" alt=" " width="800" height="392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges
&lt;/h2&gt;

&lt;p&gt;Three things I needed to figure out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No suitable existing model&lt;/strong&gt; - Nothing I could just download and use for word-level ASL detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dataset selection&lt;/strong&gt; - Which dataset has enough samples per class to train reliably?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training approach&lt;/strong&gt; - How to get effective results without massive compute?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Dataset Selection: Why WLASL-100
&lt;/h2&gt;

&lt;p&gt;I evaluated several datasets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MS-ASL&lt;/strong&gt; - Looks promising but requires downloading from YouTube. Many videos are now unavailable. Gave up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WLASL-2000&lt;/strong&gt; - 2000 classes but the class-to-sample ratio is terrible. Some signs have only 3-5 videos. Not enough for training.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WLASL-100&lt;/strong&gt; - 100 classes with more samples per class. Better balance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Went with &lt;a href="https://github.com/dxli94/WLASL" rel="noopener noreferrer"&gt;WLASL-100&lt;/a&gt; (Word-Level American Sign Language). The tradeoff: smaller vocabulary but more reliable training.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Selection: Why Pose LSTM (Not VideoMAE or VLMs)
&lt;/h2&gt;

&lt;p&gt;I started with VideoMAE - seemed like the good choice for video understanding. But:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;86M parameters is overkill for ~750 training samples&lt;/li&gt;
&lt;li&gt;Fine-tuning took forever, results were mediocre (~40%)&lt;/li&gt;
&lt;li&gt;Inference was slow for real-time detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then I tried a simpler approach: extract hand landmarks with &lt;a href="https://developers.google.com/mediapipe/solutions/vision/hand_landmarker" rel="noopener noreferrer"&gt;MediaPipe&lt;/a&gt;, feed into a lightweight LSTM. This made sense because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ASL signs are primarily about hand positions and movements&lt;/li&gt;
&lt;li&gt;MediaPipe gives us 21 landmarks per hand (x, y, z coords) = 126 features&lt;/li&gt;
&lt;li&gt;Much smaller input than raw video frames&lt;/li&gt;
&lt;li&gt;Can focus the model on what actually matters&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Model Experiments: The Failures Matter
&lt;/h2&gt;

&lt;p&gt;Tried several approaches before finding what works:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;th&gt;What Happened&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;VideoMAE (fine-tuned)&lt;/td&gt;
&lt;td&gt;86M&lt;/td&gt;
&lt;td&gt;~40%&lt;/td&gt;
&lt;td&gt;Too heavy, slow inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pose LSTM v1&lt;/td&gt;
&lt;td&gt;~2M&lt;/td&gt;
&lt;td&gt;51.52%&lt;/td&gt;
&lt;td&gt;Decent baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pose LSTM v2&lt;/td&gt;
&lt;td&gt;14M&lt;/td&gt;
&lt;td&gt;0.61%&lt;/td&gt;
&lt;td&gt;Massive overfitting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pose LSTM v2-lite&lt;/td&gt;
&lt;td&gt;~1M&lt;/td&gt;
&lt;td&gt;58.79%&lt;/td&gt;
&lt;td&gt;Stripped down, worked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pose LSTM v3&lt;/td&gt;
&lt;td&gt;4.6M&lt;/td&gt;
&lt;td&gt;1.82%&lt;/td&gt;
&lt;td&gt;FocalLoss + too many params&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pose LSTM v3-enhanced&lt;/td&gt;
&lt;td&gt;~2M&lt;/td&gt;
&lt;td&gt;65%+&lt;/td&gt;
&lt;td&gt;Final model&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Key learnings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bigger model != better results (especially with ~750 training samples)&lt;/li&gt;
&lt;li&gt;14M params on 750 samples = disaster&lt;/li&gt;
&lt;li&gt;MediaPipe hand landmarks + BiLSTM + attention pooling = sweet spot&lt;/li&gt;
&lt;li&gt;Label smoothing and mixup augmentation helped&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  AWS Infrastructure: EKS + Spot Instances
&lt;/h2&gt;

&lt;p&gt;Here's what I set up:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster Config:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS EKS in us-east-1&lt;/li&gt;
&lt;li&gt;Node group: g6.12xlarge (4x L4 GPUs each, 24GB VRAM per GPU)&lt;/li&gt;
&lt;li&gt;Used &lt;strong&gt;spot instances&lt;/strong&gt; - ~70% cost savings (~$1.72/hr vs $5.67/hr on-demand)&lt;/li&gt;
&lt;li&gt;Training time: ~1.5-2 hours per run&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why g6.12xlarge:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;L4 GPUs are newer and cheaper than older V100s&lt;/li&gt;
&lt;li&gt;4 GPUs per instance = can run multi-GPU DDP training&lt;/li&gt;
&lt;li&gt;Spot availability is good for this instance type&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Checkpoint Strategy for Spot Instances
&lt;/h2&gt;

&lt;p&gt;Spot instances can be terminated anytime. My solution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Save checkpoint every epoch to S3
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CheckpointCallback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s3_bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s3_prefix&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;s3_bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3_bucket&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;s3_prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3_prefix&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_epoch_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Save locally
&lt;/span&gt;        &lt;span class="n"&gt;checkpoint_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_save_checkpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Upload to S3 immediately
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_upload_to_s3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpoint_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Checkpoint every epoch (~5-10 min intervals)&lt;/li&gt;
&lt;li&gt;Upload to S3 immediately after each checkpoint&lt;/li&gt;
&lt;li&gt;Max loss on spot interruption: one epoch of training&lt;/li&gt;
&lt;li&gt;Resume from S3 checkpoint on new instance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;S3 bucket structure:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;s3://asl-model-checkpoints/
  └── runs/
      └── 2025-12-26-v3-enhanced/
          ├── checkpoint-epoch-001/
          ├── checkpoint-epoch-002/
          └── best/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How I Train
&lt;/h2&gt;

&lt;p&gt;The actual training flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Data preprocessing&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Download WLASL videos&lt;/li&gt;
&lt;li&gt;Extract 32 frames uniformly sampled&lt;/li&gt;
&lt;li&gt;Run MediaPipe to get hand landmarks (21 points x 2 hands x 3 coords = 126 features)&lt;/li&gt;
&lt;li&gt;Compute velocity features (frame-to-frame differences)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Model architecture (v3-enhanced)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: 126 features (hand landmarks)&lt;/li&gt;
&lt;li&gt;BiLSTM with 3 layers, hidden_dim=384&lt;/li&gt;
&lt;li&gt;Attention pooling (learns which frames matter)&lt;/li&gt;
&lt;li&gt;Classifier head with layer norm + dropout&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Training config&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="n"&gt;batch_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;
   &lt;span class="n"&gt;learning_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1e-3&lt;/span&gt;  &lt;span class="c1"&gt;# with warmup
&lt;/span&gt;   &lt;span class="n"&gt;epochs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
   &lt;span class="n"&gt;early_stopping_patience&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
   &lt;span class="n"&gt;label_smoothing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Augmentations&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Random scaling (0.9-1.1)&lt;/li&gt;
&lt;li&gt;Random rotation (-15 to +15 degrees)&lt;/li&gt;
&lt;li&gt;Mixup (alpha=0.2)&lt;/li&gt;
&lt;li&gt;No horizontal flip (would change sign meaning)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Final model (v3-enhanced):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Validation accuracy: 30–40% → 65%+&lt;/strong&gt; (from VideoMAE baseline)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Top-5 accuracy: ~90%&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inference time: &amp;lt;50ms&lt;/strong&gt; per video&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model size: 86M → ~2M params&lt;/strong&gt; (43x smaller)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not state-of-the-art, but good enough for a demo app.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Breakdown
&lt;/h2&gt;

&lt;p&gt;For one successful training run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;g6.12xlarge spot: ~$1.72/hr x 2 hours = &lt;strong&gt;~$3.44&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;S3 storage (checkpoints + data): &lt;strong&gt;~$0.50/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Total per experiment: &lt;strong&gt;under $5&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I ran maybe 15-20 experiments total during development. Spot instances saved me hundreds of dollars.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Start with simpler models first (I wasted time on VideoMAE)&lt;/li&gt;
&lt;li&gt;More aggressive data augmentation earlier&lt;/li&gt;
&lt;li&gt;Consider synthetic data generation for underrepresented classes&lt;/li&gt;
&lt;li&gt;Set up proper MLflow experiment tracking from day one&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This post covered the ML training part. Next, I'll write about deploying the model and building the app:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User signs → Webcam capture → MediaPipe → Model inference → Predicted gloss → TTS → Audio output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Part 2 will cover the AI engineering side - FastAPI server, EKS deployment, and how it all connects.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>programming</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Building AI Video Generation Pipelines with AWS Lambda Durable Functions</title>
      <dc:creator>Ooi Yee Fei</dc:creator>
      <pubDate>Wed, 24 Dec 2025 07:27:05 +0000</pubDate>
      <link>https://forem.com/aws-builders/building-ai-video-generation-pipelines-with-aws-lambda-durable-functions-4kp0</link>
      <guid>https://forem.com/aws-builders/building-ai-video-generation-pipelines-with-aws-lambda-durable-functions-4kp0</guid>
      <description>&lt;p&gt;At re:Invent 2025, AWS announced &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/lambda-durable-multi-step-applications-ai-workflows/" rel="noopener noreferrer"&gt;Lambda Durable Functions&lt;/a&gt; — a new capability that lets you write long-running, stateful workflows as simple sequential code while the SDK handles checkpointing, retries, and state management automatically.&lt;/p&gt;

&lt;p&gt;I wanted to test it with some relevant use case, so I tested with a content generation platform that transforms product photos into social media content using Gemini for image and video generation. This post covers why I chose Lambda Durable Functions and the patterns that made it work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnej032lx86ure68qbjtl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnej032lx86ure68qbjtl.png" alt=" " width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj51qdztciypwl0srbaqy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj51qdztciypwl0srbaqy.png" alt=" " width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: AI Video Generation is Slow
&lt;/h2&gt;

&lt;p&gt;Video generation with models like Veo 3.1 takes ~90 (or longer) seconds. That's a problem when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Gateway times out at 29 seconds&lt;/li&gt;
&lt;li&gt;Lambda's synchronous invocation limit is 15 minutes&lt;/li&gt;
&lt;li&gt;Users expect a response, not a "check back later"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional solutions involve Step Functions, SQS queues, or webhook callbacks. All require orchestration code, state management, and error handling boilerplate.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Lambda Durable Functions?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzgncxoliqmdaqoivrpyz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzgncxoliqmdaqoivrpyz.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lambda Durable Functions uses a checkpoint-and-replay model:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Execute &amp;amp; Checkpoint&lt;/strong&gt; — The function runs, and the SDK saves progress at each &lt;code&gt;context.step()&lt;/code&gt; call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wait &amp;amp; Suspend&lt;/strong&gt; — When encountering a wait or external call, the function terminates gracefully, preserving state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resume &amp;amp; Replay&lt;/strong&gt; — On the next invocation, completed steps are skipped using their checkpointed results&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyds1dmw300jn2bsulr42.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyds1dmw300jn2bsulr42.png" alt=" " width="800" height="135"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Key capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Workflows up to 1 year&lt;/strong&gt; — Overcomes the 15-minute limit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic retries&lt;/strong&gt; — Handles failures with exponential backoff from the last successful step&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No infrastructure&lt;/strong&gt; — No Step Functions state machines, no queues, no additional services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The SDK (&lt;code&gt;@aws/durable-execution-sdk-js&lt;/code&gt;) wraps your handler and manages the checkpointing transparently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6fp83tf4gwk1sbzs6j0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6fp83tf4gwk1sbzs6j0.png" alt=" " width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In my implementation, the Lambda function handles multiple workflow modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;GENERATE_VIDEO&lt;/code&gt; — Image → Veo 3.1 → Video URL (with polling)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GENERATE_IMAGE&lt;/code&gt; — Prompt + Reference → S3 URL&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GENERATE_CONTENT&lt;/code&gt; — Image → Content Strategy JSON&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;UPLOAD_IMAGE&lt;/code&gt; — Base64 → S3 (no durable steps needed)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pattern 1: Checkpointed Polling
&lt;/h2&gt;

&lt;p&gt;Video generation is async. You start a job, get an operation ID, and poll until completion. Here's the pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;withDurableExecution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DurableContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Step 1: Start the operation&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;operationId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;start-generation&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;startVideoGeneration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;imageBase64&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Step 2-N: Poll with unique step names&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;pollCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;pollCount&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;pollCount&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`poll-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pollCount&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;checkOperationStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;operationId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;

      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;videoUrl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="c1"&gt;// Durable wait — function suspends here&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`wait-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pollCount&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;videoUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each poll iteration has a unique step name (&lt;code&gt;poll-1&lt;/code&gt;, &lt;code&gt;poll-2&lt;/code&gt;, etc.). If Lambda times out mid-poll, it resumes from the last completed step on the next invocation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 2: Large Payload Handling
&lt;/h2&gt;

&lt;p&gt;Durable function checkpoints have a &lt;strong&gt;256KB limit&lt;/strong&gt;. Generated images easily exceed this. The solution: upload to S3 first, return the URL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Wrong — returns large base64, checkpoint fails&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;imageData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;generate-image&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// ~500KB base64&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Correct — upload to S3, return small URL&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;imageUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;generate-and-upload&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;imageBase64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;mimeType&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;uploadToS3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imageBase64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;mimeType&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// ~200 characters&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern applies to any step that produces large outputs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 3: Selective Durability
&lt;/h2&gt;

&lt;p&gt;Not every operation needs checkpointing. Quick operations like generating presigned URLs or uploading base64 to S3 can run without durable steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;UPLOAD_IMAGE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// No context.step() — runs directly, no checkpoint overhead&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handleUploadImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GENERATE_VIDEO&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Uses context.step() for long-running polling&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handleGenerateVideoWorkflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reserve durable steps for operations that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Take more than a few seconds&lt;/li&gt;
&lt;li&gt;Involve external API calls that might fail&lt;/li&gt;
&lt;li&gt;Need retry capability&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  CDK Infrastructure
&lt;/h2&gt;

&lt;p&gt;Deploying durable functions requires enabling the feature on the Lambda:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Workflow&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODEJS_20_X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;index.handler&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;memorySize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Enable durable execution&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cfnFunction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;defaultChild&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CfnFunction&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nx"&gt;cfnFunction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addPropertyOverride&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;DurableConfig&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;Enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The function URL provides direct HTTPS access without API Gateway, avoiding the 29-second timeout entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Step names help with debugging&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While the SDK tracks execution sequence (so identical step names work), using descriptive unique names like &lt;code&gt;poll-status-1&lt;/code&gt;, &lt;code&gt;poll-status-2&lt;/code&gt; makes CloudWatch logs and debugging much easier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Step outputs are serialized&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Everything returned from &lt;code&gt;context.step()&lt;/code&gt; must be JSON-serializable. No functions, no circular references, no Buffers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Cold starts add latency&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each resume is a new Lambda invocation. For workflows with many steps, cold starts accumulate. Consider provisioned concurrency for latency-sensitive workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Debugging requires understanding replay&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Console logs in completed steps won't appear on replay — the step returns its cached result immediately. Add logging outside steps or use CloudWatch to trace the full history.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Durable Functions vs Step Functions
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|        Use Case.         | Durable Functions | Step Functions |
| - - - - - - - - - -- - - | - - - - - - - - - | - - - - - - - -|
| Simple linear workflows  |          ✓        |                |
| Complex branching logic  |                   |        ✓       |
| Visual workflow designer |                   |        ✓       |
| Code-first development   |          ✓        |                |
| Sub-second coordination  |                   |        ✓       |
| Long waits (hours/days)  |          ✓        |        ✓       |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Durable Functions shine when you want to write workflows as regular code without learning a new DSL or managing state machine definitions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;With Lambda Durable Functions, the video generation pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handles 30-90 second video generation without timeout issues&lt;/li&gt;
&lt;li&gt;Automatically retries failed API calls from the last checkpoint&lt;/li&gt;
&lt;li&gt;Scales to concurrent requests without queue management&lt;/li&gt;
&lt;li&gt;Costs nothing while waiting (Lambda suspends between polls)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire backend is a single Lambda function. No Step Functions, no SQS, no orchestration infrastructure.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>ai</category>
      <category>aws</category>
    </item>
    <item>
      <title>Custom Claude Code Skill: Auto-Generating / Updating Architecture Diagrams with Excalidraw</title>
      <dc:creator>Ooi Yee Fei</dc:creator>
      <pubDate>Sun, 21 Dec 2025 08:34:33 +0000</pubDate>
      <link>https://forem.com/yooi/custom-claude-code-skill-auto-generating-updating-architecture-diagrams-with-excalidraw-227k</link>
      <guid>https://forem.com/yooi/custom-claude-code-skill-auto-generating-updating-architecture-diagrams-with-excalidraw-227k</guid>
      <description>&lt;p&gt;I maintain several GCP infrastructure projects with Terraform. Every time I onboard someone or need to explain the architecture, I face the same problem: my diagrams are either outdated or does not even exist.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Problem
&lt;/h1&gt;

&lt;p&gt;If you use any coding agent you can generally ask Claude or Gemini to scan your codecase and generate architecture diagram anytime. I tried different approaches:&lt;br&gt;
&lt;strong&gt;Mermaid diagrams&lt;/strong&gt; - Great for code-based diagrams, but they get cluttered fast. A Terraform stack with 15+ resources becomes unreadable. And they don't support the freeform layout I need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manual Excalidraw&lt;/strong&gt; - Beautiful, collaborative, easy to share. But I have to manually update it every time infrastructure changes. Which means it's always outdated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I wanted&lt;/strong&gt;: Claude Code analyzes my codebase and generates an Excalidraw diagram automatically. High-level, clean, with proper arrows.&lt;/p&gt;

&lt;p&gt;Most importantly, as I continue working on the project, Claude Code can update the diagram, by understanding the up-to-date codebase, comparing against the old diagram which give useful contexts sometimes.&lt;/p&gt;
&lt;h2&gt;
  
  
  Exploring Solutions
&lt;/h2&gt;

&lt;p&gt;I considered different approaches to make this convenient:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ad-hoc prompts&lt;/strong&gt; — Just ask Claude Code when I need an update&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Claude Code Skill&lt;/strong&gt; — Teach Claude the Excalidraw format once, reuse everywhere&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plugin with slash command&lt;/strong&gt; — &lt;code&gt;/excalidraw&lt;/code&gt; to trigger generation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP server&lt;/strong&gt; — More complex, but could watch for file changes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hooks&lt;/strong&gt; — Auto-trigger diagram updates after certain git operations&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I started with &lt;strong&gt;Claude Code Skills&lt;/strong&gt; — the simplest to implement. Skills are markdown files that teach Claude how to perform specific tasks. No scripts, no servers, just knowledge files. Perfect for experimenting with quality and feasibility before investing in more complex solutions.&lt;/p&gt;
&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Here’s what it generated for an EKS AI/ML Terraform stack — Karpenter, GPU workloads, Rafay orchestration — from a single prompt:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjmnvox5q7e8j9tc6s69.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjmnvox5q7e8j9tc6s69.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Full Writeup
&lt;/h2&gt;

&lt;p&gt;The full post covers how the skill generates valid .excalidraw JSON with proper labels and elbow arrows, installation, usage commands, and more examples.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://yeefei.beehiiv.com/p/custom-claude-code-skill-auto-generating-updating-architecture-diagrams-with-excalidraw?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=excalidraw-skill" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Read the full writeup on Build Signals&lt;/a&gt;
&lt;/p&gt;




&lt;p&gt;Links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/ooiyeefei/ccc" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; (For the Skill)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ooiyeefei/rafay-templates/tree/main/infra-env/ai-on-eks/jark-stack" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; (For the Sample repo I used)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://excalidraw.com/" rel="noopener noreferrer"&gt;Excalidraw&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>architecture</category>
      <category>coding</category>
    </item>
    <item>
      <title>Beyond Coding: Your Accountability Buddy with Claude Code Skill</title>
      <dc:creator>Ooi Yee Fei</dc:creator>
      <pubDate>Mon, 15 Dec 2025 00:11:45 +0000</pubDate>
      <link>https://forem.com/yooi/beyond-coding-your-accountability-buddy-with-claude-code-skill-4omh</link>
      <guid>https://forem.com/yooi/beyond-coding-your-accountability-buddy-with-claude-code-skill-4omh</guid>
      <description>&lt;p&gt;After months of demanding work and endless context-switching recently, it got me thinking: out of the 1001 personal learning topics, build ideas, and reading lists I've been wanting to tackle over the past 7-8 months - what have I actually achieved? How am I progressing? Have I drifted? Are those still the priorities I remember? What even are they anymore?&lt;/p&gt;

&lt;p&gt;I realized I'd lost track. And time keeps slipping by.&lt;/p&gt;

&lt;p&gt;So I wondered: for someone who's not the most organized person - someone who tends to be random and follows what they want to do impromptu - how can I have a better, systematic way to handle and track all this? Something that gives me insights when I need them, lets me reflect on where I'm heading, and tells me if I'm still on track. But most importantly, keeps the flexibility and openness I need to explore new things as they come.&lt;/p&gt;

&lt;p&gt;My first thought: not another task tracker. Not the 999th document or note tool or app that I've tried and given up on (or forgotten about) within a week. How can I make this more fun? Something that suits my way of thinking - how I actually feel motivated?&lt;br&gt;
&lt;em&gt;Challenge.&lt;/em&gt; That was my first thought. But how do I build it? I need help. Since I'm a big fan of Claude Code day-to-day, is there something I can do with it differently?&lt;/p&gt;

&lt;p&gt;So I decided to give it a try. I had no idea what would work - only time would tell. If this system makes me remember it and stick with it for more than a week, it works. I started brainstorming and designing with Claude. One thing on my long list was exploring how Claude Code skills could be used in different scenarios. No apps. No tools. Just a Claude Code skill - simple enough that I can "inject" it into my beloved Claude Code to help me with this.&lt;/p&gt;

&lt;p&gt;After letting the brainstorming juice flow, I ended up with this skill: &lt;strong&gt;&lt;a href="https://github.com/ooiyeefei/ccc/tree/main/skills/streak" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What it does: every day I check in, log progress, track ideas, and try to maintain momentum. But my original workflow was locked into a specific folder structure and only worked for "building" type challenges.&lt;/p&gt;

&lt;p&gt;What if someone wanted to track a fitness challenge? A reading goal? A meditation habit?&lt;/p&gt;

&lt;p&gt;I decided to build a Claude Code skill that could track &lt;em&gt;any&lt;/em&gt; type of challenge. After discussing with friends, there seems to be a working pattern that's open, flexible, and adaptable enough regardless of challenge type - fitness, learning, habits. We all have some aspect of our life we hope to improve. It seems to fit just right with what Claude Code skills can do.&lt;/p&gt;

&lt;h2&gt;
  
  
  

  &lt;iframe src="https://www.youtube.com/embed/_5YbD9Gr_9Q"&gt;
  &lt;/iframe&gt;



&lt;/h2&gt;

&lt;h2&gt;
  
  
  How It Ended Up
&lt;/h2&gt;

&lt;p&gt;My previous &lt;code&gt;/daily-checkin&lt;/code&gt; workflow worked great for my 30-day AI/ML challenge. It had:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;challenge-log.md&lt;/code&gt; for tracking progress&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;daily-context.md&lt;/code&gt; for setting up each session&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ideas-backlog.md&lt;/code&gt; for things to try - whenever my random brain pops up with an idea, I just throw it in the backlog. Or if I'm lazy, I tell Claude in scattered-brain chatting style and it logs it in a structured way for me.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;preferences.md&lt;/code&gt; for my stack and tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But it was hardcoded for tech challenges. Questions like "what did you ship?" and "tech stack used?" are useless if you're tracking a workout routine or trying to read 12 books this year.&lt;/p&gt;

&lt;p&gt;I wanted something that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Works for &lt;em&gt;any&lt;/em&gt; challenge type (learning, building, fitness, creative, habits)&lt;/li&gt;
&lt;li&gt;Asks the &lt;em&gt;right&lt;/em&gt; questions based on what you're tracking&lt;/li&gt;
&lt;li&gt;Keeps the same useful file structure but adapts the content&lt;/li&gt;
&lt;li&gt;Detects connections across different challenges (compound learning)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Approach
&lt;/h2&gt;

&lt;p&gt;Instead of building separate trackers for each domain, I realized the &lt;em&gt;file structure&lt;/em&gt; could stay universal - only the &lt;em&gt;content&lt;/em&gt; needs to adapt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Think of it like this:&lt;/strong&gt; A preferences file is useful whether you're tracking code or workouts. For code, it stores your stack and tools. For fitness, it stores your equipment and workout types. For food, maybe cuisine and diet type. Same purpose, different content.&lt;/p&gt;

&lt;p&gt;This led to the "type-adaptive" design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same files for everyone (&lt;code&gt;preferences.md&lt;/code&gt;, &lt;code&gt;backlog.md&lt;/code&gt;, &lt;code&gt;today.md&lt;/code&gt;, etc.)&lt;/li&gt;
&lt;li&gt;Different sections filled in based on challenge type&lt;/li&gt;
&lt;li&gt;Guided creation flow that asks type-specific questions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;The full post covers challenge types, type-adaptive preferences, cross-challenge insights, installation, usage commands, and real examples (learning + fitness challenges) — plus what I learned after 2 weeks of using it.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://yeefei.beehiiv.com/p/new-post-35a3f6cddaacfd32?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=streak-skill" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Read the full writeup on Build Signals&lt;/a&gt;

&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/ooiyeefei/ccc/tree/main/skills/streak" rel="noopener noreferrer"&gt;GitHub (Streak Skill)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ooiyeefei/ccc" rel="noopener noreferrer"&gt;GitHub (ccc Plugin Collection)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>productivity</category>
      <category>programming</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>From Video to Voiceover in Seconds: Running MLX Swift on ARM-Based iOS Devices</title>
      <dc:creator>Ooi Yee Fei</dc:creator>
      <pubDate>Fri, 05 Dec 2025 06:09:27 +0000</pubDate>
      <link>https://forem.com/yooi/from-video-to-voiceover-in-seconds-running-mlx-swift-on-arm-based-ios-devices-1md9</link>
      <guid>https://forem.com/yooi/from-video-to-voiceover-in-seconds-running-mlx-swift-on-arm-based-ios-devices-1md9</guid>
      <description>&lt;p&gt;I started exploring ARM-based AI applications - how generative AI and machine learning models can run locally on ARM devices. Video editing friction sparked the idea: how can I cut down the time to polish demo videos?&lt;/p&gt;

&lt;p&gt;That led to ScriptCraft - an iOS app that transcribes video, cleans up the script with an on-device LLM, generates new narration, and exports the final video. No cloud APIs. No uploads. Just your phone doing the work.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Idea
&lt;/h2&gt;

&lt;p&gt;I wanted to repurpose video content quickly. Record something, get an AI-polished transcript, hear it narrated professionally, export. No cloud APIs. No waiting for uploads. Just your phone doing the work.&lt;/p&gt;

&lt;p&gt;How it works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Import a video&lt;/li&gt;
&lt;li&gt;Transcribe the audio with on-device speech recognition&lt;/li&gt;
&lt;li&gt;Enhance the transcript with a local LLM&lt;/li&gt;
&lt;li&gt;Generate narration via TTS&lt;/li&gt;
&lt;li&gt;Replace the original audio and export&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This gives an end-to-end ready video that is refined and ready to share.&lt;/p&gt;




&lt;h2&gt;
  
  
  MLX Swift: The Promise and The Reality
&lt;/h2&gt;

&lt;p&gt;Apple's MLX framework lets you run ML models on device. MLX Swift brings this to iOS. I wanted to use it for the transcript enhancement step - clean up filler words, fix grammar, make it more readable.&lt;/p&gt;

&lt;p&gt;The model: Qwen 0.5B 4-bit quantized. Small enough for mobile, supposedly capable enough for basic text tasks.&lt;/p&gt;

&lt;p&gt;Setting it up looked straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;modelConfiguration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;ModelConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configuration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"mlx-community/Qwen2.5-0.5B-Instruct-4bit"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="kt"&gt;LLMModelFactory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shared&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loadContainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;configuration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;modelConfiguration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Setting Up Physical Device Testing
&lt;/h2&gt;

&lt;p&gt;MLX requires Metal GPU for inference. I used an iPhone 13 mini, which supports Metal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Device setup:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Enable Developer Mode: Settings &amp;gt; Privacy &amp;amp; Security &amp;gt; Developer Mode. Restart and confirm.&lt;/li&gt;
&lt;li&gt;Match Xcode version to iOS version to avoid build errors.&lt;/li&gt;
&lt;li&gt;For local servers, configure with your Mac's network IP (e.g., &lt;code&gt;http://10.0.0.100:5055&lt;/code&gt;) and bind to all interfaces with &lt;code&gt;--host 0.0.0.0&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Transcription Challenge
&lt;/h2&gt;

&lt;p&gt;iOS has a built-in speech recognizer. Use it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;recognizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;SFSpeechRecognizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;locale&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Locale&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"en-US"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;SFSpeechURLRecognitionRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;audioURL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requiresOnDeviceRecognition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;recognizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recognitionTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;with&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Worked great for short clips. For a 3-minute video? It returned maybe 30 seconds of text.&lt;/p&gt;

&lt;p&gt;The issue: SFSpeechRecognizer has limits. Apple doesn't document them clearly, but around 1 minute of audio seems to be the practical ceiling for a single request.&lt;/p&gt;

&lt;p&gt;My fix: chunk the audio.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;chunkDuration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;TimeInterval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;30.0&lt;/span&gt;

&lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;transcribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;AVAsset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;throws&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;CMTimeGetSeconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;startTime&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;stride&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;by&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;chunkDuration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;chunkResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;transcribeChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;startTime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunkResult&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;joined&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;separator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better, but still inconsistent. Some chunks came back empty. The video had mixed audio sources (voice + background music from editing). The on-device recognizer struggles with that.&lt;/p&gt;

&lt;p&gt;Added a fallback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Try on-device first&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;onDeviceResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;transcribeWithMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;onDevice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;onDeviceResult&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isEmpty&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;onDeviceResult&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Fallback to server-based (uses Apple's servers)&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;transcribeWithMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;onDevice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What I learned:&lt;/strong&gt; On-device speech recognition is good for clean audio, short clips. For real-world content with mixed sources, you need fallbacks.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hallucination Problem
&lt;/h2&gt;

&lt;p&gt;Got transcription working. Fed it to Qwen 0.5B. The output?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Sure, here's a rewritten version of your transcript with improved flow and engagement:

[Completely fabricated content about topics never mentioned in the original]"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model hallucinated. My original prompt asked it to "enhance and improve" the script. The 0.5B model interpreted that as "make stuff up."&lt;/p&gt;

&lt;p&gt;The fix was embarrassingly simple: ask for less.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;buildPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s"&gt;"""
    Clean up this transcript. ONLY fix grammar and remove filler words.
    DO NOT add any new information or content.

    IMPORTANT: Output the same content, just cleaned. Do not invent or add anything.

    Original: &lt;/span&gt;&lt;span class="se"&gt;\(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="s"&gt;

    Cleaned:
    """&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No more "enhance." No more "improve." Just "clean up." The model stopped inventing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Think of it like this:&lt;/strong&gt; A 0.5B model is like a meticulous proofreader - excellent at catching typos and cleaning up grammar, but ask them to ghostwrite your memoir and they'll start making up your childhood. Keep the job description tight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I learned:&lt;/strong&gt; Small LLMs need small tasks. The 0.5B model can clean text. It cannot creatively rewrite. Know your model's limits and prompt accordingly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Audio Playback: One More Gotcha
&lt;/h2&gt;

&lt;p&gt;Generated the narration. Called &lt;code&gt;AVAudioPlayer.play()&lt;/code&gt;. Silence.&lt;/p&gt;

&lt;p&gt;Turns out iOS needs explicit permission to play audio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;audioSession&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;AVAudioSession&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sharedInstance&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="n"&gt;audioSession&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setCategory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;playback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="n"&gt;audioSession&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setActive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, audio plays in simulator but not on device (unless headphones are connected). Another simulator-vs-device difference.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Final Pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Video Import
    |
[Extract Audio] --&amp;gt; AVAssetExportSession
    |
[Chunk Audio] --&amp;gt; 30-second segments
    |
[Transcribe] --&amp;gt; SFSpeechRecognizer (on-device + server fallback)
    |
[Enhance] --&amp;gt; MLX Swift + Qwen 0.5B (cleanup only)
    |
[Generate Speech] --&amp;gt; Kokoro TTS via mlx-audio
    |
[Compose Video] --&amp;gt; AVMutableComposition (original video + new audio)
    |
Export
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each step has its own service class. Each handles its own errors. The view coordinates them.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Surprised Me
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;MLX Swift works.&lt;/strong&gt; Once you're on a real device with Metal GPU, inference is fast. The 0.5B model runs in under a second for short texts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;iOS has a lot of ML built in.&lt;/strong&gt; SFSpeechRecognizer, Vision framework, Natural Language framework. You can build surprisingly capable apps without any external models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MLX needs Metal.&lt;/strong&gt; Set up physical device testing early when working with on-device ML.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Explore ARM-native frameworks first.&lt;/strong&gt; iOS has powerful built-in ML capabilities - SFSpeechRecognizer, Vision, Natural Language, Core ML. Understand what's already optimized for ARM before adding external models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Simpler prompts first.&lt;/strong&gt; Start with "clean this text" and only add complexity if the model handles it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test with real content.&lt;/strong&gt; My test videos were clean screen recordings. Real videos have background noise, music, multiple speakers. Test with messy content early.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;On-device ML is powerful but unforgiving. The APIs exist. The models exist. But the gap between "works in simulator" and "works on device" is larger than I expected.&lt;/p&gt;

&lt;p&gt;The reward: an app that processes video entirely locally. No cloud uploads. No API costs. No latency.&lt;/p&gt;

&lt;p&gt;Worth the debugging sessions? Absolutely.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Platform:      iOS 18+ (requires Metal GPU)
Language:      Swift
ML Framework:  MLX Swift
LLM:           Qwen 0.5B 4-bit (mlx-community)
Speech:        SFSpeechRecognizer (Apple)
TTS:           Kokoro via mlx-audio (server)
Video:         AVFoundation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>softwaredevelopment</category>
      <category>ai</category>
      <category>mobile</category>
      <category>programming</category>
    </item>
    <item>
      <title>How I Built a Multi-Model AI Agent That Negotiates With Vendors</title>
      <dc:creator>Ooi Yee Fei</dc:creator>
      <pubDate>Thu, 04 Dec 2025 08:58:26 +0000</pubDate>
      <link>https://forem.com/yooi/how-i-built-a-multi-model-ai-agent-that-negotiates-with-vendors-p87</link>
      <guid>https://forem.com/yooi/how-i-built-a-multi-model-ai-agent-that-negotiates-with-vendors-p87</guid>
      <description>&lt;p&gt;&lt;em&gt;Built with Gradio MCP, Gemini, Claude, Modal, Blaxel + LangGraph, and ElevenLabs&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Recently I explored multi-model AI orchestration - wiring together Gradio MCP, Modal for serverless compute, DSPy for structured extraction, Blaxel with LangGraph for agent hosting, and ElevenLabs for voice AI. I wanted to go deeper than tutorials - actually build something, break it, and document what I learned.&lt;/p&gt;

&lt;p&gt;I explored multi-model orchestration by building VendorGuard AI - an agent that processes invoices, analyzes pricing, and negotiates with vendors via voice call.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;Invoice processing is tedious. Extract data from a PDF, compare prices to historical records, send emails asking for better rates. I wanted to see if I could wire up multiple AI services to handle this end-to-end.&lt;/p&gt;

&lt;p&gt;The system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Takes an invoice (PDF or image)&lt;/li&gt;
&lt;li&gt;Extracts all the data automatically&lt;/li&gt;
&lt;li&gt;Compares prices against historical records&lt;/li&gt;
&lt;li&gt;Generates a negotiation strategy&lt;/li&gt;
&lt;li&gt;Has a voice AI agent call the vendor&lt;/li&gt;
&lt;li&gt;Sends a follow-up email summarizing what was agreed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The interesting part wasn't any single piece - it was how they all connected.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture (And Why Each Piece)
&lt;/h2&gt;

&lt;p&gt;Here's what the system looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Upload Invoice
      ↓
[Gradio Frontend + MCP Server]
      ↓
[Modal: OCR with Gemini Vision]
      ↓
[Modal: Structured Extraction with DSPy]
      ↓
[Convex: Store Vendors, Invoices, Price History]
      ↓
[Blaxel + LangGraph: Negotiation Strategy with Claude]
      ↓
[ElevenLabs: Voice Negotiation Call]
      ↓
[Claude: Follow-up Email]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I didn't start with this architecture. It evolved as I hit walls and found solutions. Let me walk through the key decisions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Modal
&lt;/h2&gt;

&lt;p&gt;To serve custom or open source AI models with sub-second cold starts and selected GPUs&lt;br&gt;
I define a function, add a decorator, and it runs in the cloud with GPU access easily. Deploy with &lt;code&gt;modal deploy tools.py&lt;/code&gt;. No Dockerfile. No infrastructure config. Pay per second of actual compute.&lt;/p&gt;


&lt;h2&gt;
  
  
  DSPy: The Framework That Changed How I Think About LLMs
&lt;/h2&gt;

&lt;p&gt;I started with the usual approach - prompt templates with "please return JSON in this format" instructions. You know how that goes. Half the time the model returns something slightly wrong, you add more instructions, it works for a bit, then breaks again.&lt;/p&gt;

&lt;p&gt;DSPy flips this completely. Instead of writing prompts, you define &lt;em&gt;signatures&lt;/em&gt; - what goes in, what comes out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InvoiceExtractionSignature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Signature&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract structured invoice data from OCR text.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;ocr_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;InputField&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;invoice_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;InvoiceExtraction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dspy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OutputField&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;InvoiceExtraction&lt;/code&gt; is a Pydantic model with 40+ fields. DSPy handles generating the right prompt, parsing the output, and ensuring it matches the schema.&lt;/p&gt;

&lt;p&gt;No more "please format as JSON". No more parsing errors. Just define what you want and get it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I learned:&lt;/strong&gt; DSPy vs LangChain isn't about one being "better". They solve different problems. DSPy is for structured extraction - when you need reliable typed outputs. LangChain is for chains of operations. I used DSPy for the extraction step and it was rock solid.&lt;/p&gt;




&lt;h2&gt;
  
  
  Blaxel + LangGraph: Deploying Agents Without the Fuss
&lt;/h2&gt;

&lt;p&gt;I needed to host the negotiation strategy agent somewhere. Options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Roll my own FastAPI + AWS/GCP - too much setup&lt;/li&gt;
&lt;li&gt;LangServe - tied to LangChain&lt;/li&gt;
&lt;li&gt;Blaxel - supports LangGraph out of the box, simple deploy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I decided to try out Blaxel. It has native LangGraph support, so I could define my agent as a graph and deploy it directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# agent.py
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

&lt;span class="c1"&gt;# main.py
&lt;/span&gt;&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StreamingResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploy: &lt;code&gt;bl deploy&lt;/code&gt;. Get a scalable endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I learned:&lt;/strong&gt; Blaxel is like AWS / GCP (Infra) for AI agents. Native LangGraph support meant I could use graph-based agent patterns without wrestling with deployment infrastructure. Also framework-agnostic if you're not all-in on LangChain.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gradio MCP: The Feature That Surprised Me Most
&lt;/h2&gt;

&lt;p&gt;Gradio 6.0 shipped with MCP (Model Context Protocol) server support. I almost missed this feature.&lt;/p&gt;

&lt;p&gt;Add one flag to your app:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;demo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mcp_server&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every function in your Gradio app becomes a &lt;em&gt;tool&lt;/em&gt; that any MCP-compatible AI can call. The function's docstring becomes the tool description. Types are inferred.&lt;/p&gt;

&lt;p&gt;I built four MCP tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;mcp_get_vendor_data&lt;/code&gt; - vendor contact info for follow-ups&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mcp_get_vendor_price_history&lt;/code&gt; - historical pricing for negotiation leverage&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mcp_get_invoice_details&lt;/code&gt; - complete invoice with line items&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mcp_analyze_invoice_prices&lt;/code&gt; - compares current vs historical prices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Blaxel + LangGraph agent uses them internally, but they're also exposed via SSE endpoint (&lt;code&gt;/gradio_api/mcp/sse&lt;/code&gt;) for external clients like Claude Desktop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I learned:&lt;/strong&gt; This is genuinely magical. No OpenAPI spec writing, no tool schema definitions. Just Python functions with docstrings. MCP is going to be big.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Model Orchestration: Play to Strengths
&lt;/h2&gt;

&lt;p&gt;One insight that clicked during this build: different models are good at different things.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|          Task         |       Model      |            Why            |
|-----------------------|------------------|---------------------------|
| Invoice OCR           | Gemini 2.0 Flash | Best vision, fast         |
| Structured Extraction | Gemini + DSPy    | Good at following schemas |
| Price Analysis        | Gemini + DSPy    | Compares 6-month history, |
|                       |                  | calculates % changes      |
| Negotiation Strategy  | Claude Sonnet 4  | Nuanced reasoning         |
| Follow-up Emails.     | Claude Sonnet 4  | Professional tone         |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of forcing one model to do everything, I let each handle what it's best at. The orchestration layer (Gradio + MCP tools) ties them together.&lt;/p&gt;




&lt;h2&gt;
  
  
  Context Engineering: Beyond Prompt Engineering
&lt;/h2&gt;

&lt;p&gt;One pattern that emerged from this build: &lt;strong&gt;context engineering&lt;/strong&gt; - systematically constructing AI contexts from multiple data sources at runtime.&lt;/p&gt;

&lt;p&gt;This goes beyond writing good prompts. It's about assembling the right information from different places so the AI can do its job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-source aggregation:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When the voice agent starts a negotiation, it needs context from four sources:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Business database - company profile&lt;/li&gt;
&lt;li&gt;Vendor database - contact info, relationship history&lt;/li&gt;
&lt;li&gt;Invoice records - line items, totals, payment terms&lt;/li&gt;
&lt;li&gt;Price history - 6 months of historical data per item&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All of this gets aggregated and injected into the voice agent's system prompt via dynamic variables.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Computed insights:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Raw data isn't enough. The system transforms it into negotiation intelligence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compares current prices against historical best prices&lt;/li&gt;
&lt;li&gt;Calculates percentage markups (e.g., "+17.6% above best price")&lt;/li&gt;
&lt;li&gt;Prioritizes items with highest negotiation potential&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example insight generated: &lt;em&gt;"8-inch Shear: +17.6% above best price (best was RM 5.27, now RM 6.20)"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adaptive strategy:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The negotiation strategy adapts based on computed context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Items above best price → specific line-item targets&lt;/li&gt;
&lt;li&gt;Prices stable → focus on payment terms, volume discounts&lt;/li&gt;
&lt;li&gt;No historical data → generic best-practice tactics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What I learned:&lt;/strong&gt; Context engineering is underrated. The same model performs dramatically differently depending on what context you give it. Investing in context assembly paid off more than tweaking prompts.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Rough Edges
&lt;/h2&gt;

&lt;p&gt;Not everything was smooth:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ElevenLabs transcript events&lt;/strong&gt; - The API returns different event structures depending on... something? Sometimes &lt;code&gt;source&lt;/code&gt;, sometimes &lt;code&gt;role&lt;/code&gt;. Sometimes &lt;code&gt;message&lt;/code&gt;, sometimes &lt;code&gt;text&lt;/code&gt;. Had to write defensive parsing code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gradio MCP is new&lt;/strong&gt; - The feature shipped recently and docs are sparse. If your function docstrings aren't precise, the tools become hard for AI agents to use correctly. I spent time rewriting docstrings to get reliable tool calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DSPy learning curve&lt;/strong&gt; - Coming from prompt templates, the signature-based approach took adjustment. Documentation has gaps. Worth it once it clicks, but expect some ramp-up time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;If I rebuilt this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start with DSPy earlier&lt;/strong&gt; - I wasted time on prompt engineering that DSPy would have solved immediately.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plan the MCP tools upfront&lt;/strong&gt; - I added them late. If I'd designed around MCP from the start, the architecture would be cleaner.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Invest more in voice analytics&lt;/strong&gt; - ElevenLabs Conversational AI is impressive, but I barely scratched the surface. The transcripts could feed into cost analysis to identify negotiation patterns, post-call QA to improve agent responses, and better navigation for reviewing specific moments in calls. There's a lot more value to extract from the voice data.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;The pattern I'm most excited about: &lt;strong&gt;specialized services + orchestration&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Modal for compute. Blaxel + LangGraph for agents. ElevenLabs for voice. Each does one thing well. MCP ties them together.&lt;/p&gt;

&lt;p&gt;This feels cleaner than a monolithic "do everything" agent. And it's easier to debug - when something breaks, you know which piece to look at.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/spaces/MCP-1st-Birthday/yooi-vendorguard-ai" rel="noopener noreferrer"&gt;Live Demo / Repo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Demo Video:&lt;br&gt;


  &lt;iframe src="https://www.youtube.com/embed/IfzncmLhw58"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

</description>
      <category>agents</category>
      <category>showdev</category>
      <category>ai</category>
      <category>automation</category>
    </item>
    <item>
      <title>Building The Digital Exorcism: Infinite Replayability Through Dynamic Generation (Part 2)</title>
      <dc:creator>Ooi Yee Fei</dc:creator>
      <pubDate>Sun, 30 Nov 2025 19:04:56 +0000</pubDate>
      <link>https://forem.com/kirodotdev/building-the-digital-exorcism-infinite-replayability-through-dynamic-generation-part-2-cjo</link>
      <guid>https://forem.com/kirodotdev/building-the-digital-exorcism-infinite-replayability-through-dynamic-generation-part-2-cjo</guid>
      <description>&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Part 2 of 2 - Adding infinite replayability to the security game&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://dev.to/yooi/the-digital-exorcism-app-with-kiro-security-learning-through-haunted-codebase-part-1-1g05"&gt;Read Part 1&lt;/a&gt; - how I built the initial version with specs, steering, hooks, and MCP.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: One-and-Done
&lt;/h2&gt;

&lt;p&gt;Github link: &lt;a href="https://github.com/ooiyeefei/owasp-exorcist" rel="noopener noreferrer"&gt;https://github.com/ooiyeefei/owasp-exorcist&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After building v1, I had a working game. But it had zero replayability. Once you played it, you knew exactly what to fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XSS in &lt;code&gt;VulnerableComponent1.tsx&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Hardcoded API key in &lt;code&gt;VulnerableComponent2.tsx&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Code injection in &lt;code&gt;VulnerableComponent3.tsx&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The magic was gone after one playthrough. I needed every session to be unique.&lt;/p&gt;

&lt;h2&gt;
  
  
  Back to Specs (Yes, Again)
&lt;/h2&gt;

&lt;p&gt;I told Kiro: "I need dynamic vulnerability generation." Kiro suggested: "Let's spec it out."&lt;/p&gt;

&lt;p&gt;Even for enhancements, specs provide structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  New Requirements
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Requirement 9: Dynamic Vulnerability Generation&lt;/span&gt;
&lt;span class="p"&gt;
9.&lt;/span&gt;1. WHEN user starts game THEN system SHALL randomly select 3-5 unique OWASP types
&lt;span class="p"&gt;9.&lt;/span&gt;2. WHEN vulnerabilities generated THEN system SHALL create React components with vulnerable code
&lt;span class="p"&gt;9.&lt;/span&gt;3. WHEN templates loaded THEN system SHALL validate for pattern contamination
&lt;span class="p"&gt;9.&lt;/span&gt;4. WHEN session created THEN system SHALL assign unique session ID
&lt;span class="p"&gt;9.&lt;/span&gt;5. Easy mode = 3 vulnerabilities with hints
&lt;span class="p"&gt;9.&lt;/span&gt;6. Hard mode = 4-5 vulnerabilities, hints only
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  New Design: Template Architecture
&lt;/h3&gt;

&lt;p&gt;Kiro helped me design a template-based system. Each vulnerability = JSON file containing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metadata (ID, OWASP category, severity)&lt;/li&gt;
&lt;li&gt;Code patterns (vulnerable + fix)&lt;/li&gt;
&lt;li&gt;Detection (regex + fix indicators)&lt;/li&gt;
&lt;li&gt;Hints (easy + hard)&lt;/li&gt;
&lt;li&gt;Educational content (analogies, real examples, AWS services)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  New Tasks
&lt;/h3&gt;

&lt;p&gt;15 concrete steps from design to implementation. Having a roadmap helped me stay focused.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Template System
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: The Template Schema
&lt;/h3&gt;

&lt;p&gt;I created the first template for code injection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"code-injection-eval-v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"owaspCategory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A03:2021"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Code Injection Vulnerability"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"codePattern"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vulnerablePattern"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"const result = eval(userCode);"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"fixPattern"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"// Don't execute user code"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"detection"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"regex"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"pattern"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eval&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s*&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;("&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"fixIndicators"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"removed eval"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"safe alternative"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"easy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Search for dynamic code execution"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"hard"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Examine how user input is processed"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"educationalContent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"analogy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Like giving strangers your car AND house keys"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"realWorldImpact"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Equifax breach: 143M people affected"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"awsServices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AWS Lambda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"useCase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Run untrusted code in isolated environments with minimal IAM permissions"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This template contains everything needed to generate a unique vulnerability!&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Building 8 Templates
&lt;/h3&gt;

&lt;p&gt;I created templates for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Code Injection (eval)&lt;/li&gt;
&lt;li&gt;XSS (dangerouslySetInnerHTML)&lt;/li&gt;
&lt;li&gt;Hardcoded Secrets&lt;/li&gt;
&lt;li&gt;SQL Injection&lt;/li&gt;
&lt;li&gt;IDOR&lt;/li&gt;
&lt;li&gt;Insecure Deserialization&lt;/li&gt;
&lt;li&gt;Insufficient Logging&lt;/li&gt;
&lt;li&gt;Missing Input Validation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each with complete educational content and AWS recommendations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: The Dynamic Generator
&lt;/h3&gt;

&lt;p&gt;I upgraded &lt;code&gt;start-game.cjs&lt;/code&gt; to &lt;code&gt;start-game-dynamic.cjs&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Load 8 templates&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;templates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;loadTemplates&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Randomly select 3-5 based on difficulty&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;selected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;selectVulnerabilities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;templates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;difficulty&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Generate React components&lt;/span&gt;
&lt;span class="nx"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;template&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;component&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateComponent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writeFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;componentPath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;component&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Create unique session&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sessionId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomBytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;: Every session generates different vulnerabilities!&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bug: Pattern Contamination
&lt;/h2&gt;

&lt;p&gt;I tested it and immediately hit a weird bug.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Some vulnerabilities showed as "fixed" right after generation, even though the vulnerable code was clearly there!&lt;/p&gt;

&lt;p&gt;I traced it down:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Generated component had:&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;h3&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;Code Injection via eval()&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;h3&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;

&lt;span class="c1"&gt;// Detection pattern was:&lt;/span&gt;
&lt;span class="nx"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nb"&gt;eval&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;

&lt;/span&gt;&lt;span class="c1"&gt;// Scanner matched the TITLE, not the actual code!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The scanner was detecting patterns in JSX display text, causing false positives.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Investigation
&lt;/h3&gt;

&lt;p&gt;I asked Kiro: "Why is this matching JSX text?"&lt;/p&gt;

&lt;p&gt;Kiro explained: "You're removing comments but not JSX content. Patterns in titles and hints trigger matches."&lt;/p&gt;

&lt;p&gt;That made sense. I needed to be smarter about what to scan.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Enhanced Detection
&lt;/h3&gt;

&lt;p&gt;I enhanced the corruption scanner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;checkVulnerabilityFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;detection&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;codeOnly&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Remove comments&lt;/span&gt;
  &lt;span class="nx"&gt;codeOnly&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;codeOnly&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\/\*[\s\S]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;?\*\/&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;codeOnly&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;codeOnly&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="sr"&gt;.*/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Remove JSX text content (but not code!)&lt;/span&gt;
  &lt;span class="c1"&gt;// Matches: &amp;gt;{text without &amp;lt; or {}&amp;lt;&lt;/span&gt;
  &lt;span class="c1"&gt;// Removes: "Code Injection via eval()"&lt;/span&gt;
  &lt;span class="c1"&gt;// Preserves: `SELECT * FROM users WHERE id = ${userId}`&lt;/span&gt;
  &lt;span class="nx"&gt;codeOnly&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;codeOnly&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;([^&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;{&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;?)&lt;/span&gt;&lt;span class="sr"&gt;&amp;lt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;gt;&amp;lt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Now check patterns&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RegExp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;detection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;codeOnly&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key insight&lt;/strong&gt;: The regex &lt;code&gt;/&amp;gt;([^&amp;lt;{]*?)&amp;lt;/g&lt;/code&gt; removes JSX display text while preserving actual code patterns in template literals and expressions!&lt;/p&gt;

&lt;p&gt;This gave me 100% detection accuracy with zero false positives.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Prevention: Template Design Rules
&lt;/h3&gt;

&lt;p&gt;To prevent this issue in future templates, I created guidelines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;❌ DON'T: &lt;span class="nt"&gt;&amp;lt;h3&amp;gt;&lt;/span&gt;Code Injection via eval()&lt;span class="nt"&gt;&amp;lt;/h3&amp;gt;&lt;/span&gt;
✅ DO: &lt;span class="nt"&gt;&amp;lt;h3&amp;gt;&lt;/span&gt;Code Injection Vulnerability&lt;span class="nt"&gt;&amp;lt;/h3&amp;gt;&lt;/span&gt;

❌ DON'T: "Search for 'eval(' in code"
✅ DO: "Search for dynamic code execution"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Automated Validation
&lt;/h3&gt;

&lt;p&gt;I added validation to the generator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;validateTemplate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;warnings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vulnerablePattern&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;warnings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;⚠️  Template name contains vulnerable pattern&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hints&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hint&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hint&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vulnerablePattern&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;warnings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;⚠️  Hint contains vulnerable pattern&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;warnings&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the system catches template issues automatically before they become bugs!&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result: Infinite Replayability
&lt;/h2&gt;

&lt;p&gt;With the bug fixed, I had a working dynamic system:&lt;/p&gt;

&lt;h3&gt;
  
  
  Test Run 1:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🎲 Selected: XSS, Hardcoded Secrets, Missing Validation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test Run 2:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🎲 Selected: Code Injection, Insecure Deserialization, Insufficient Logging
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test Run 3:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🎲 Selected: SQL Injection, IDOR, Missing Validation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every session was unique! With 8 vulnerability types, there are many possible combinations.&lt;/p&gt;

&lt;p&gt;Infinite replayability: ✅&lt;/p&gt;

&lt;h2&gt;
  
  
  The AWS Integration
&lt;/h2&gt;

&lt;p&gt;One more enhancement: I added AWS security service recommendations to every vulnerability.&lt;/p&gt;

&lt;p&gt;Now Kiro teaches both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Application security&lt;/strong&gt;: How to fix the code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud security&lt;/strong&gt;: Which AWS services prevent this in production&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;✅ Fixed! Using environment variables now.

🎓 Lesson: Hardcoded secrets live FOREVER in git history!

☁️ AWS: Use &lt;span class="gs"&gt;**Secrets Manager**&lt;/span&gt; to store and auto-rotate API keys.
It's like a vault that changes the combination automatically!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This provides complete security education - from code to cloud.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Documentation
&lt;/h2&gt;

&lt;p&gt;To make this system maintainable, I created comprehensive documentation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Template Design Rules&lt;/strong&gt; - Pattern contamination prevention, design checklist&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Template Checklist&lt;/strong&gt; - Quick reference for developers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical Documentation&lt;/strong&gt; - Problem analysis, solution implementation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improvements Summary&lt;/strong&gt; - Impact assessment, metrics&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Anyone can now contribute new vulnerability templates without breaking the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Learnings
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Template systems enable scale&lt;/strong&gt; - Takes longer upfront, but trivial to extend later&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation prevents bugs&lt;/strong&gt; - Catching issues early saves debugging time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specs work for enhancements&lt;/strong&gt; - Structure helps even when adding to existing projects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge cases reveal themselves&lt;/strong&gt; - Building for scale reveals hidden issues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation is investment&lt;/strong&gt; - Ensures long-term maintainability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complete security picture&lt;/strong&gt; - Teaching both code and cloud security matters&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Complete Experience
&lt;/h2&gt;

&lt;p&gt;Now when users play:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Say "start the game", choose difficulty&lt;/li&gt;
&lt;li&gt;Kiro generates 3-5 unique vulnerabilities&lt;/li&gt;
&lt;li&gt;Dashboard shows haunted, corrupted UI&lt;/li&gt;
&lt;li&gt;Hunt and fix vulnerabilities with Kiro's help&lt;/li&gt;
&lt;li&gt;Each fix includes code lessons + AWS recommendations&lt;/li&gt;
&lt;li&gt;UI heals as corruption drops&lt;/li&gt;
&lt;li&gt;At 0%, app transforms to peaceful state&lt;/li&gt;
&lt;li&gt;Play again for a completely different experience!&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;With the foundation in place, idea for future enhancements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More vulnerability types (CSRF, broken auth, security misconfiguration)&lt;/li&gt;
&lt;li&gt;Progressive difficulty (unlock harder vulnerabilities as you learn)&lt;/li&gt;
&lt;li&gt;Achievement system (badges for specific combinations)&lt;/li&gt;
&lt;li&gt;Community templates (user-created vulnerabilities)&lt;/li&gt;
&lt;li&gt;Multiplayer mode (compete to fix fastest)&lt;/li&gt;
&lt;li&gt;CI/CD integration (security training in pipelines)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The template system makes all of this possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ooiyeefei/owasp-exorcist
&lt;span class="nb"&gt;cd &lt;/span&gt;owasp-exorcist
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Say "start the game" to Kiro, choose your difficulty, and start banishing demons!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pro tip&lt;/strong&gt;: Play multiple times! Every session generates different vulnerabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;This project showed me that &lt;strong&gt;complex features need structure&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Dynamic generation required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear requirements (specs)&lt;/li&gt;
&lt;li&gt;Thoughtful design (architecture)&lt;/li&gt;
&lt;li&gt;Automated validation (quality assurance)&lt;/li&gt;
&lt;li&gt;Comprehensive documentation (maintainability)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without specs: fragile and hard to extend.&lt;br&gt;&lt;br&gt;
With specs: scalable and maintainable.&lt;/p&gt;

&lt;p&gt;Kiro didn't just help me code - it helped me think systematically about the problem. That's the real value.&lt;/p&gt;

</description>
      <category>kiro</category>
    </item>
  </channel>
</rss>
