<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Nicanor Korir</title>
    <description>The latest articles on Forem by Nicanor Korir (@nicanor_korir).</description>
    <link>https://forem.com/nicanor_korir</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F80291%2F84a281c4-7b95-4d50-841a-fb51cb7e9610.jpg</url>
      <title>Forem: Nicanor Korir</title>
      <link>https://forem.com/nicanor_korir</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/nicanor_korir"/>
    <language>en</language>
    <item>
      <title>VibeCheck - Community Help for AI Builders</title>
      <dc:creator>Nicanor Korir</dc:creator>
      <pubDate>Sun, 01 Mar 2026 16:47:41 +0000</pubDate>
      <link>https://forem.com/nicanor_korir/vibecheck-community-help-for-ai-builders-27e2</link>
      <guid>https://forem.com/nicanor_korir/vibecheck-community-help-for-ai-builders-27e2</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/weekend-2026-02-28"&gt;DEV Weekend Challenge: Community&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Community
&lt;/h2&gt;

&lt;p&gt;VibeCheck serves &lt;strong&gt;"vibe coders"&lt;/strong&gt; - the growing wave of non-traditional builders using AI tools like Cursor, Bolt, Lovable, and Replit to create real products. They're not "learning to code" - they're building businesses, MVPs, and passion projects with AI assistance.&lt;/p&gt;

&lt;p&gt;These builders inevitably hit the "70% wall" where AI can't finish the job. When things break:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vendor solutions assume they know how to navigate complex codebases&lt;/li&gt;
&lt;li&gt;AI tools keep repeating the same broken solutions&lt;/li&gt;
&lt;li&gt;No existing community understands their unique experience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They deserve a community that meets them where they are.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;VibeCheck is a community platform that helps vibe coders get unstuck and learn from each other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AI Triage Coach&lt;/strong&gt; - Describe your problem in simple words. Get a "rescue prompt" optimized for your AI tool - not code fixes, but better questions to ask your AI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Screenshot Analysis&lt;/strong&gt; - Paste a screenshot of your error. AI vision analyzes it and helps you describe the problem clearly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompt Quality Feedback&lt;/strong&gt; - Real-time scoring helps you write better problem descriptions before asking for help.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Community Help&lt;/strong&gt; - Post requests, get suggestions from other users, earn points for helping and interacting.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompt Library&lt;/strong&gt; - Share "what didn't work → what worked" prompt swaps. Learn from others' breakthroughs.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Video Demo:&lt;/strong&gt; &lt;a href="https://www.loom.com/share/298b6c55fb7b4bde9cee9bdff6c9c95e" rel="noopener noreferrer"&gt;Checkout this Loom VibeCheck Demo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live App:&lt;/strong&gt; &lt;a href="https://vibecheck-community.vercel.app/" rel="noopener noreferrer"&gt;https://vibecheck-community.vercel.app/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/nicanor-korir" rel="noopener noreferrer"&gt;
        nicanor-korir
      &lt;/a&gt; / &lt;a href="https://github.com/nicanor-korir/vibecheck-community" rel="noopener noreferrer"&gt;
        vibecheck-community
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      VibeCheck - Community Help for AI Builders
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;VibeCheck&lt;/h1&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Got 70% done with AI? We'll help you finish the last 30%.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A community platform for vibe coders using Cursor, Bolt, Lovable, Replit, and more. Built for the &lt;a href="https://dev.to/challenges/weekend-2026-02-28" rel="nofollow"&gt;DEV Weekend Challenge&lt;/a&gt;.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;The Problem&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;"Vibe coding" has exploded — the global market for vibe coding platforms is now &lt;a href="https://mktclarity.com/blogs/news/vibe-coding-market" rel="nofollow noopener noreferrer"&gt;$4.7 billion&lt;/a&gt;, with &lt;a href="https://www.secondtalent.com/resources/vibe-coding-statistics/" rel="nofollow noopener noreferrer"&gt;63% of users being non-developers&lt;/a&gt;. But there's a critical gap:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;"Non-engineers can get 70% of the way there surprisingly quickly, but that final 30% becomes an exercise in diminishing returns."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;When something breaks, non-developers are stuck:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://addyo.substack.com/p/the-70-problem-hard-truths-about" rel="nofollow noopener noreferrer"&gt;66% of developers&lt;/a&gt; say AI solutions are "almost right, but not quite" — leading to time-consuming debugging&lt;/li&gt;
&lt;li&gt;Stuck in loops: copy error, paste, get new error, repeat ("The fix breaks something else. You ask AI to fix that. It creates two more problems.")&lt;/li&gt;
&lt;li&gt;Stack Overflow feels intimidating, YouTube tutorials don't answer specific questions&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;The Solution&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;We don't replace your AI&lt;/strong&gt;…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/nicanor-korir/vibecheck-community" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;




&lt;h2&gt;
  
  
  How I Built It
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tech Stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Next.js&lt;/strong&gt; - React framework with App Router&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supabase&lt;/strong&gt; - Authentication, PostgreSQL database, Row Level Security&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Gemini&lt;/strong&gt; - AI triage coaching and vision-based screenshot analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TypeScript&lt;/strong&gt; - Type-safe development&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tailwind CSS&lt;/strong&gt; - Styling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vercel&lt;/strong&gt; - Deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Implementation Details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI Triage generates "rescue prompts" tailored to specific AI tools (Cursor, Bolt, Lovable, etc.)&lt;/li&gt;
&lt;li&gt;Vision API analyzes error screenshots and translates them into clear problem descriptions&lt;/li&gt;
&lt;li&gt;Real-time prompt quality scoring uses pattern matching to help users write better requests&lt;/li&gt;
&lt;li&gt;Community features include points, rewards for helping, and gamification&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devchallenge</category>
      <category>weekendchallenge</category>
      <category>showdev</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>Building an interactive robotics portfolio</title>
      <dc:creator>Nicanor Korir</dc:creator>
      <pubDate>Thu, 29 Jan 2026 20:03:03 +0000</pubDate>
      <link>https://forem.com/nicanor_korir/from-kenya-to-germany-building-an-interactive-robotics-portfolio-4gdn</link>
      <guid>https://forem.com/nicanor_korir/from-kenya-to-germany-building-an-interactive-robotics-portfolio-4gdn</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/new-year-new-you-google-ai-2025-12-31"&gt;New Year, New You Portfolio Challenge Presented by Google AI&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  About Me
&lt;/h2&gt;

&lt;p&gt;I'm Nicanor Korir, a techie who grew up in the highlands regions in Kenya, fell in love with computers, and became a software engineer. Looking back, I would say &lt;a href="https://www.brainyquote.com/quotes/steve_jobs_416875" rel="noopener noreferrer"&gt;Steve Jobs was right&lt;/a&gt;, you can only connect the dots while looking backwards.&lt;/p&gt;

&lt;p&gt;My tech journey is actually an interesting one. I first interacted with computers at the age of 18, after high school. With the break after high school, I decided to learn basic computer skills to just be familiarised with things like using a computer(this was actually true). It was interesting, actually, it was so awesome that I spent a lot of hours just trying to figure out different things. That's when I decided to pursue something computer-related in my degree, and this led me to computer science.&lt;/p&gt;

&lt;p&gt;Studying computer science in my first and second years was hard since everything was new except maths and physics, but I enjoyed it. In my third year i focused on software more, and I fell in love with software engineering, and the rest is history.&lt;/p&gt;

&lt;p&gt;Fast forward to today, I've worked with different clients in different industries, solving human problems through technology. Currently, I am doing my master's studies in AI and robotics, and with the current wave in the tech trend,s there have been a lot of things for me to learn and understand about intelligent and interactive systems for the future.&lt;/p&gt;

&lt;p&gt;As a part-time student, I am working as a CTO at Alma, a startup backed by AI Nation at Berliin, Germany. This is where I've put in my skills as a leader, to help build and establish a solution in the social lives of the GBV survivors and help them maneuver their daily lives.&lt;/p&gt;

&lt;p&gt;This portfolio is a way for me to share my tech journey in a way that tells more about me and how the end user will associate with me.&lt;/p&gt;

&lt;p&gt;My portfolios have been changing with time. This Dev Challenge came at the right time, as I've been thinking of setting myself up for my next transition from a student to a professional.&lt;/p&gt;

&lt;h2&gt;
  
  
  Portfolio
&lt;/h2&gt;


&lt;div class="ltag__cloud-run"&gt;
  &lt;iframe height="600px" src="https://nicanor-170395639051.europe-west3.run.app/"&gt;
  &lt;/iframe&gt;
&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; If the embed doesn't load, visit &lt;a href="https://nicanor-170395639051.europe-west3.run.app/" rel="noopener noreferrer"&gt;nicanor-170395639051.europe-west3.run.app&lt;/a&gt; directly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How I Built It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Tech Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Framework&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Next.js 16 (App Router)&lt;/td&gt;
&lt;td&gt;Server components, streaming, edge-ready&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini 2.0 Flash&lt;/td&gt;
&lt;td&gt;Fast, accurate, cost-effective&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3D&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;React Three Fiber&lt;/td&gt;
&lt;td&gt;Cyberpunk vision scanner aesthetic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Animation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Framer Motion&lt;/td&gt;
&lt;td&gt;Smooth, physics-based transitions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;State&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Zustand&lt;/td&gt;
&lt;td&gt;Lightweight, persists user preferences&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Styling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;TailwindCSS 4&lt;/td&gt;
&lt;td&gt;Design tokens, responsive by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google Cloud Run&lt;/td&gt;
&lt;td&gt;Great DX&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Deployment
&lt;/h3&gt;

&lt;p&gt;I played around with Google Cloud Run, it had been a while since I'd explored the Google Cloud environment.&lt;/p&gt;

&lt;p&gt;First, I checked out all the services available with the free $300 credits. Then I dove into the Cloud Run console and CLI. I always love Google Cloud docs, there are both CLI &lt;code&gt;gcloud&lt;/code&gt; and &lt;code&gt;console&lt;/code&gt; guides. My first deployment through gcloud cli landed in &lt;code&gt;us-central1&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;First authentication and setting the project id to and from Google Cloud&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Authenticate with Google Cloud&lt;/span&gt;
gcloud auth login

&lt;span class="c"&gt;# Set your project&lt;/span&gt;
gcloud config &lt;span class="nb"&gt;set &lt;/span&gt;project YOUR_PROJECT_ID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then deployment on the current portfolio folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy nicanor-portfolio &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set-env-vars&lt;/span&gt; &lt;span class="s2"&gt;"GEMINI_API_KEY=api_key,NODE_ENV=production,NEXT_PUBLIC_GA_MEASUREMENT_ID=G-XXX"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--labels&lt;/span&gt; dev-tutorial&lt;span class="o"&gt;=&lt;/span&gt;devnewyear2026 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--memory&lt;/span&gt; 512Mi &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cpu&lt;/span&gt; 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But then I thought, why am I deploying to the US when I'm sitting in Berlin? So I switched to the closest region, Frankfurt (&lt;code&gt;europe-west3&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy nicanor-portfolio &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; europe-west3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set-env-vars&lt;/span&gt; &lt;span class="s2"&gt;"GEMINI_API_KEY=,NODE_ENV=production,NEXT_PUBLIC_GA_MEASUREMENT_ID="&lt;/span&gt; &lt;span class="se"&gt;\ &lt;/span&gt;
  &lt;span class="nt"&gt;--labels&lt;/span&gt; dev-tutorial&lt;span class="o"&gt;=&lt;/span&gt;devnewyear2026
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The latency difference was noticeable.&lt;/p&gt;

&lt;p&gt;I got excited and tried to set up a custom subdomain (&lt;code&gt;nicanor.mydomain.com&lt;/code&gt; - this is just an example, I can't reveal the unfinished configuration yet). Unfortunately, Frankfurt doesn't support Cloud Run domain mapping 🙃&lt;/p&gt;

&lt;p&gt;So I had two options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Switch back to a different region that supports domain mapping&lt;/li&gt;
&lt;li&gt;Update my DNS configurations manually&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Eventually, I decided to roll with the Cloud Run URL directly. I'll configure the subdomain later&lt;/p&gt;

&lt;h3&gt;
  
  
  How I Used Gemini
&lt;/h3&gt;

&lt;p&gt;Backstory, my initial coding agent was &lt;code&gt;Claude&lt;/code&gt;, especially the ui and getting the first parts. I had more issues, especially with integrating &lt;code&gt;gemini3&lt;/code&gt; since &lt;code&gt;claude&lt;/code&gt; doesn't have much context on gemini3 but gemini2.0-flash, so I had to switch to using &lt;code&gt;gemini&lt;/code&gt; as my coding agent on the terminal&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsv7ucv4w3hu3mj2e6n6h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsv7ucv4w3hu3mj2e6n6h.png" alt=" " width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instead of dumping information and hoping visitors find what they need, I decided to go for an interactive and surprise format. The first thing you see is a question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What brings you here?"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check my work → I'll show you metrics, experience, and achievements&lt;/li&gt;
&lt;li&gt;Building something? → Let me show you my technical leadership&lt;/li&gt;
&lt;li&gt;Fellow engineer? → Let's dive into the architecture&lt;/li&gt;
&lt;li&gt;Just curious? → Welcome! Here's my story from Kenya to Berlin&lt;/li&gt;
&lt;li&gt;Something else? → Tell me, and Gemini will figure out what's relevant&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once a user chooses a path, then Gemini generates the nodes and a summary of my journey so that the user gets a summary about me. Also, I had a list of common pre-generated questions that I had compiled, and Gemini uses this for the nodes(summary road map) and for the intelligent chatbot.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AI Architecture
&lt;/h3&gt;

&lt;p&gt;For my llm(Gemini) usage, I didn't want the traditional roadmap, like the server-client architecture&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User asks a question → Call Gemini → Return response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple, but expensive. Every visitor burns API credits&lt;/p&gt;

&lt;p&gt;I built a &lt;strong&gt;hybrid chat system&lt;/strong&gt; that gives a better response to the user. Once a user opens a chatbot, they get greeted with a message from gemini and then the user can use the pre-generated questions or type their own questions. The response will either be fetched from the pre-generated JSON data or from Gemini. This is the user flow for the chatbot interaction &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqmcjzr8axafpu9d50jy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqmcjzr8axafpu9d50jy.png" alt=" " width="800" height="1203"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I initially started with &lt;code&gt;gemini-2.0-flash&lt;/code&gt;, which worked so well, and then I later decided to switch to &lt;code&gt;gemini-3.0-flash-preview&lt;/code&gt; to try out gemini3, especially the Intelligence and intelligence. It worked well, actually, and I am happy with my progress&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faf8es33kkga7baeh0pl8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faf8es33kkga7baeh0pl8.png" alt=" " width="800" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Intent Analysis
&lt;/h3&gt;

&lt;p&gt;When someone types "something else" and enters their own context, like "I'm researching trauma-informed AI" or "need a keynote speaker", Gemini does something clever like this in the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Actual code from my intent analyzer&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`Analyze this visitor's intent and:
1. Classify their primary interest
2. Suggest which portfolio sections are relevant
3. Recommend a visual theme (robotics/research/business)
4. Generate a personalized welcome message`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, someone researching AI ethics gets routed to my Alma project (trauma-informed AI for GBV survivors), and someone looking for a speaker sees my events calendar and media kit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Same portfolio. Personalized experience for different users&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Vision System: Because Why Not?
&lt;/h3&gt;

&lt;p&gt;I'll be honest, the 3D robot-eye scanner thing that greets you? It started as a joke. I was working on face look detections and tracking for a specific robotic solution, and I got hooked on how the image analysis happens. That's when I decided to also experiment with almost the idea on the ui website, and my portfolio was the best.&lt;/p&gt;

&lt;p&gt;I built a cyberpunk-inspired "vision system" that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scans my current image (not really, but it looks like it does)&lt;/li&gt;
&lt;li&gt;Shows "HUMAN DETECTED" with confidence scores&lt;/li&gt;
&lt;li&gt;Adapts its detection labels based on your chosen path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Is it necessary? No.&lt;br&gt;
Is it memorable? I hope so&lt;/p&gt;

&lt;h3&gt;
  
  
  Hallucination Prevention
&lt;/h3&gt;

&lt;p&gt;AI portfolios have a problem: they lie. Ask any LLM about a random developer, and it might confidently describe projects that don't exist.&lt;/p&gt;

&lt;p&gt;I implemented multiple safeguards:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Grounded System Prompt&lt;/strong&gt;: 170+ lines of verified facts about my career, with explicit "NEVER claim" instructions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pre-flight Checks&lt;/strong&gt;: Before returning any response, I scan for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Placeholder text (&lt;code&gt;[insert link]&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Email addresses (privacy)&lt;/li&gt;
&lt;li&gt;False company claims ("worked at Google/Meta/etc")&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Exact Link Enforcement&lt;/strong&gt;: Every URL in the system prompt is real. Gemini is instructed to use them verbatim or not at all.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What I'm Most Proud Of
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Near-Perfect Lighthouse Scores
&lt;/h3&gt;

&lt;p&gt;I'm genuinely proud of the performance optimization. Here are the actual Lighthouse results:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Desktop Score&lt;/th&gt;
&lt;th&gt;Mobile Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Performance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;99&lt;/td&gt;
&lt;td&gt;93&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accessibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best Practices&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SEO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;And the Core Web Vitals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;First Contentful Paint:&lt;/strong&gt; 0.3s&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Largest Contentful Paint:&lt;/strong&gt; 0.6s&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total Blocking Time:&lt;/strong&gt; 10ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cumulative Layout Shift:&lt;/strong&gt; 0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed Index:&lt;/strong&gt; 1.0s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focsio6p4425nzznpkkvl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focsio6p4425nzznpkkvl.png" alt=" " width="800" height="740"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Personalization Actually Works
&lt;/h3&gt;

&lt;p&gt;With the pre-generated JSON data, I restructured the prompt as per my story and aligned the ui/ux for storytelling. I tested the personalization with a friend of mine, and she was happy that it was personalized to tell more about me. I am happy about the integration of Gemini and how it intelligently works to give personalised information to the users&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The API Efficiency
&lt;/h3&gt;

&lt;p&gt;Running Gemini for every interaction would cost a fortune, so I simulated a temporary AI memory through JSON data. My hybrid approach means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Common questions: 0ms latency, 0 API cost&lt;/li&gt;
&lt;li&gt;Complex questions: Full Gemini intelligence&lt;/li&gt;
&lt;li&gt;Estimated: 0.2 API calls per visitor(tested by tests and manual tests)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. The Story It Tells
&lt;/h3&gt;

&lt;p&gt;The goal of this portfolio is to tell my Story about my career and my background. The theme tells what I am currently working in and someone can relate to that. Right from when the user enters the site to the end, it tells my story&lt;/p&gt;

&lt;p&gt;It's like the latest chapter in a book I could be writing that might end in suspense.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. It's Actually Useful
&lt;/h3&gt;

&lt;p&gt;I started as a challenge, but now I am going to be using it fully as my portfolio and to tell my stories and relate with any of my audience.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. UI/UX
&lt;/h3&gt;

&lt;p&gt;I am happy about the robotic(computer vision) theme in general. Right from the mouse, vision scanner, the 3D on the hero section, and the map nodes to tell different story paths for those who want to explore different routes of my journey&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The code is open source. Feel free to fork it and build your own intent-aware portfolio(you might want to remove most of my data, I'll structure it better for open-source):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/nicanor-korir/portfolio" rel="noopener noreferrer"&gt;github.com/nicanor-korir/portfolio&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live Demo:&lt;/strong&gt; &lt;a href="https://nicanor-170395639051.europe-west3.run.app/" rel="noopener noreferrer"&gt;nicanor-170395639051.europe-west3.run.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Key files to explore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;src/lib/gemini.ts&lt;/code&gt; - The hybrid AI architecture&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;src/app/api/analyze-intent/route.ts&lt;/code&gt; - Intent classification&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;src/components/3d/HeroVisionSystem.tsx&lt;/code&gt; - The vision scanner&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;src/components/sections/Hero.tsx&lt;/code&gt; - Personalized path selection&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Thanks for reading!&lt;/strong&gt; If you made it this far, you're exactly the kind of curious person this portfolio was built for.&lt;/p&gt;

&lt;p&gt;Come say hi: &lt;a href="https://nicanor-170395639051.europe-west3.run.app/#contact" rel="noopener noreferrer"&gt;nicanor-170395639051.europe-west3.run.app/#contact&lt;/a&gt;&lt;/p&gt;




</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>portfolio</category>
      <category>gemini</category>
    </item>
    <item>
      <title>From Shaky Farm Videos to Sharp Diagnoses - Building a Client-Side Media Pipeline</title>
      <dc:creator>Nicanor Korir</dc:creator>
      <pubDate>Sat, 24 Jan 2026 23:53:33 +0000</pubDate>
      <link>https://forem.com/nicanor_korir/from-shaky-farm-videos-to-sharp-diagnoses-building-a-client-side-media-pipeline-2iok</link>
      <guid>https://forem.com/nicanor_korir/from-shaky-farm-videos-to-sharp-diagnoses-building-a-client-side-media-pipeline-2iok</guid>
      <description>&lt;p&gt;A farmer stands in her field, phone in hand, recording a quick video of a sick tomato plant. The camera shakes. The sun creates harsh shadows. Her thumb accidentally covers the corner of the frame for three seconds.&lt;/p&gt;

&lt;p&gt;That video contains maybe 900 frames. Maybe ten of them are actually usable for plant disease diagnosis. The other 890 are blurry, redundant, or partially obscured.&lt;/p&gt;

&lt;p&gt;The question that drove weeks of development: how do you automatically find those ten good frames without uploading 900 to a server?&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Client-Side Processing
&lt;/h2&gt;

&lt;p&gt;The obvious architecture: upload the raw video, process it on the server with Python or FFmpeg, and send back extracted frames.&lt;/p&gt;

&lt;p&gt;For my users, that architecture fails:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftfidk48fsrkhnxm0v5m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftfidk48fsrkhnxm0v5m.png" alt=" " width="800" height="1312"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bandwidth costs real money.&lt;/strong&gt; On metered data plans common in rural Africa, uploading a 30MB video might cost more than the farmer earns that day. Extracting frames locally and uploading only the useful ones—maybe 500KB total—changes the economics entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Networks are unreliable.&lt;/strong&gt; A large upload over 2 GB with frequent drops means failed transfers, wasted data, and frustrated users. Smaller uploads succeed more often.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency compounds.&lt;/strong&gt; Upload time plus server processing time plus download time. On slow networks, this becomes intolerable. Processing locally eliminates the round-trip.&lt;/p&gt;

&lt;p&gt;So everything happens in the browser. Image compression, video frame extraction, blur detection, frame selection—all client-side JavaScript using Canvas APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Image Compression: The Foundation
&lt;/h2&gt;

&lt;p&gt;Every image, regardless of source, goes through compression before upload. The target: 1024 pixels on the longest dimension, JPEG at 80% quality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zzecup33yl6f6pu5106.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zzecup33yl6f6pu5106.png" alt=" " width="800" height="1446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Why these specific numbers?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1024 pixels&lt;/strong&gt; is the sweet spot for Claude Vision. Larger images don't improve diagnostic accuracy—Claude doesn't need to count individual pixels to identify disease symptoms. Smaller images lose the detail needed to spot early infections. I tested this extensively: 1024px captures everything diagnostically relevant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;80% JPEG quality&lt;/strong&gt; is where compression artifacts become invisible to humans but file size drops dramatically. At 90% quality, files are 50% larger with no visible benefit. At 70%, subtle disease symptoms might be obscured by compression artifacts. 80% hits the sweet spot.&lt;/p&gt;

&lt;p&gt;The result: a 4000x3000 pixel PNG (roughly 15MB) becomes a 1024x768 JPEG (roughly 100KB). That's a 150x reduction in data transferred. On a slow network, that's the difference between a 30-second upload and a 2-second upload.&lt;/p&gt;

&lt;h2&gt;
  
  
  Video Frame Extraction: The Hard Part
&lt;/h2&gt;

&lt;p&gt;Videos are information-dense but mostly redundant. A 10-second clip at 30fps contains 300 frames, but probably only 5-10 are worth analyzing.&lt;/p&gt;

&lt;p&gt;The naive approach—grab every Nth frame—fails in practice&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff7dfsp1oo3zix98ory9h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff7dfsp1oo3zix98ory9h.png" alt=" " width="800" height="153"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blur clustering.&lt;/strong&gt; If the camera moves at the 3-second mark, you get a blurry frame. The sharp frames at 2.8 and 3.2 seconds are skipped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Redundancy.&lt;/strong&gt; If the camera holds steady for 5 seconds, you might extract 2 nearly identical frames while missing coverage of different angles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Temporal bias.&lt;/strong&gt; Fixed intervals ignore content. The user might have shown three different angles at 5, 15, and 25 seconds. Fixed extraction at 0, 10, 20, 30 might miss all three.&lt;/p&gt;

&lt;p&gt;The solution requires two innovations: blur detection to find sharp frames, and temporal filtering to ensure diversity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Blur Detection: The Laplacian Trick
&lt;/h2&gt;

&lt;p&gt;To find sharp frames, you need to measure sharpness. The computer vision community solved this decades ago with the Laplacian operator.&lt;/p&gt;

&lt;p&gt;The intuition: sharp images have many edges—sudden transitions between light and dark. Blurry images have few edges because transitions are smoothed out.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9ljseh4k2mh3vpqpxvc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9ljseh4k2mh3vpqpxvc.png" alt=" " width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Laplacian operator computes, for each pixel, how different it is from its immediate neighbors. In a flat region (like a solid color), Laplacian values are near zero. At an edge (like a leaf vein against leaf tissue), values are high.&lt;/p&gt;

&lt;p&gt;For each pixel, the calculation is: &lt;code&gt;4 * center - top - bottom - left - right&lt;/code&gt;. If the center pixel is similar to its neighbors, this equals zero. If the center pixel is dramatically different (an edge), this equals a large positive or negative number.&lt;/p&gt;

&lt;p&gt;By computing the variance of Laplacian values across the entire image, you get a single "sharpness score." High variance means many edges, which means sharp. Low variance means few edges, which means blurry.&lt;/p&gt;

&lt;p&gt;In testing, I found these thresholds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Variance above 500&lt;/strong&gt;: reliably sharp, excellent for analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Variance 100-500&lt;/strong&gt;: acceptable, usable if nothing better available&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Variance below 100&lt;/strong&gt;: too blurry, likely unusable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The calculation happens on downscaled frames (1024px max dimension) to keep processing fast. Full-resolution Laplacian computation would be too slow for real-time frame scoring on mobile devices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Temporal Diversity: Not Just The Sharpest
&lt;/h2&gt;

&lt;p&gt;Selecting the 10 sharpest frames sounds right, but fails in practice. They often cluster in one time window when the camera happens to be stable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fspuc02t6au094yg28qq7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fspuc02t6au094yg28qq7.png" alt=" " width="800" height="1300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For disease diagnosis, temporal diversity matters. The user naturally shifts perspective while recording—showing the top of the leaf, then the underside, then the stem. A diverse frame selection captures this variation.&lt;/p&gt;

&lt;p&gt;The algorithm:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Score all candidate frames by sharpness (Laplacian variance)&lt;/li&gt;
&lt;li&gt;Sort by sharpness, highest first&lt;/li&gt;
&lt;li&gt;Select the sharpest frame&lt;/li&gt;
&lt;li&gt;Calculate minimum time gap: video duration divided by (desired frames × 2)&lt;/li&gt;
&lt;li&gt;Skip any frame too close in time to already-selected frames&lt;/li&gt;
&lt;li&gt;Select the next-sharpest that passes the temporal filter&lt;/li&gt;
&lt;li&gt;Repeat until you have enough frames&lt;/li&gt;
&lt;li&gt;Sort selected frames chronologically for display&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For a 30-second video, selecting 5 frames enforces at least 3-second gaps between selections. The result: frames are both sharp AND temporally diverse, capturing different moments and angles from the recording.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Complete Video Pipeline
&lt;/h2&gt;

&lt;p&gt;When a user uploads a video:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0k5mmbaj3cotx92z3hi1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0k5mmbaj3cotx92z3hi1.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Load metadata&lt;/strong&gt;: Get duration and dimensions without loading the full video into memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calculate sample points&lt;/strong&gt;: Divide duration by max frames to get interval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seek and capture&lt;/strong&gt;: For each sample point, seek the video element and draw the current frame to the canvas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score sharpness&lt;/strong&gt;: Compute Laplacian variance for each captured frame&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Select best frames&lt;/strong&gt;: Apply the temporal diversity filter to choose final frames&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compress frames&lt;/strong&gt;: Each selected frame goes through image compression (1024px, 80% JPEG)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Return for analysis&lt;/strong&gt;: Final frames ready for Claude Vision&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The whole process typically takes 3-5 seconds for a 30-second video on a mid-range phone. Users see a progress indicator showing frames being extracted, then their selected frames are displayed for confirmation before analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Multiple Media Items
&lt;/h2&gt;

&lt;p&gt;Users can upload multiple images or videos in a single session. The system needs to know: same plant from different angles, or different plants entirely?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7flnywdn8ehsqkkqhndz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7flnywdn8ehsqkkqhndz.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For images, the UI asks directly. "Different Plants" produces N separate diagnoses. "Same Plant" produces one comprehensive diagnosis considering all evidence.&lt;/p&gt;

&lt;p&gt;For video input, extracted frames are treated as "the same plant" by default. The user was recording continuously, so frames presumably show the same subject. This makes video the fastest path to a comprehensive diagnosis—point and record, get analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Error Handling: Fail Visibly
&lt;/h2&gt;

&lt;p&gt;Real-world media processing fails constantly. Files are corrupted. Formats are unsupported. Memory runs out on old phones.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkoh3ytav2ozcq4pipnnl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkoh3ytav2ozcq4pipnnl.png" alt=" " width="800" height="970"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unsupported format&lt;/strong&gt;: Check MIME type before processing. Provide clear error listing supported formats (JPEG, PNG, GIF, WebP for images; MP4, WebM, QuickTime for video).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Corrupt file&lt;/strong&gt;: Wrap processing in try-catch. If the image fails to load or the video fails to seek, provide specific feedback.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Processing failure&lt;/strong&gt;: If compression fails, retry with lower quality settings. If that fails, skip the problematic file but continue with others.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory pressure&lt;/strong&gt;: Process one item at a time rather than parallelizing. Release canvas references after each operation. Revoke blob URLs immediately after use. This is slower but prevents crashes on memory-constrained devices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unusable output&lt;/strong&gt;: If blur detection determines all extracted frames are below the usability threshold, warn the user and suggest re-recording with a steadier hand or better lighting.&lt;/p&gt;

&lt;p&gt;The principle: fail visibly, never silently. A user who sees "Video too blurry, try recording again in better light" can fix the problem. A user whose analysis silently uses garbage frames loses trust in the whole system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance: Making It Work on Old Phones
&lt;/h2&gt;

&lt;p&gt;The target device isn't the latest iPhone. It's a three-year-old Android phone with 2GB of RAM running Chrome.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Canvas reuse&lt;/strong&gt;: Creating a new canvas element for each operation is expensive. The pipeline reuses canvas elements across operations, clearing and resizing as needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downscale first&lt;/strong&gt;: Blur detection doesn't need full resolution. A 4000x3000 image downscaled to 1024x768 gives equally valid sharpness scores at 1/12th the processing cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Progressive loading&lt;/strong&gt;: For videos, metadata loads first (duration, dimensions), then frames extract one at a time with progress feedback. Users see activity immediately rather than waiting for complete processing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explicit cleanup&lt;/strong&gt;: Large media can exhaust mobile browser memory. The pipeline explicitly nulls references after use. Video object URLs are revoked immediately after frame extraction. This prevents memory from accumulating across multiple uploads.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Test with garbage input.&lt;/strong&gt; Development used well-lit, centered photos taken by someone who knows what they're doing. Production receives shaky videos recorded while walking, photos with fingers partially covering the lens, and screenshots of screenshots. Robustness only emerged after I deliberately tried to break the system with the worst possible input.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Client-side processing is more feasible than expected.&lt;/strong&gt; My initial assumption was that "real" image processing needed server resources. Modern browsers have Canvas APIs that handle common operations efficiently. Even blur detection—fundamentally a computer vision algorithm—runs fast enough on phone CPUs when you're smart about resolution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Users prefer control over automation.&lt;/strong&gt; Early versions automatically selected "best" frames without showing users what was selected. Users didn't trust it. Showing extracted frames and letting users confirm or re-record built confidence in the system. The extra step is worth the trust it creates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fail visibly.&lt;/strong&gt; Silent failures are the worst. A processing error that shows "Something went wrong, try again" is infinitely better than one that silently produces garbage output. Users can recover from visible failures; invisible ones erode trust permanently.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Invisibility Goal
&lt;/h2&gt;

&lt;p&gt;The media pipeline is invisible when it works. Users upload a shaky video, see some frames appear, confirm their selection, and get a diagnosis. They don't think about Laplacian variance or temporal diversity or JPEG compression ratios.&lt;/p&gt;

&lt;p&gt;That invisibility is the goal. The complexity exists so that users don't have to understand it. They just need to point their phone at a sick plant and get help.&lt;/p&gt;

&lt;p&gt;Every technical decision—client-side processing, blur scoring, temporal filtering, compression ratios—serves that goal. Not because the techniques are elegant, but because they make the experience work for farmers with old phones, slow networks, and unsteady hands.&lt;/p&gt;

&lt;p&gt;Technical complexity that serves users is engineering. Technical complexity that serves itself is self-indulgence. The media pipeline exists to turn chaos into clarity, invisibly.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;This completes the Shamba-MedCare technical series.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The full system represents months of iteration across architecture, prompt engineering, accessibility, and media processing. The code is open source. The problems are documented. The opportunity—bringing agricultural AI to farmers who need it most—is massive.&lt;/p&gt;

&lt;p&gt;If you're working on similar problems, I'd love to hear from you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/nicanor/shamba-dawa" rel="noopener noreferrer"&gt;Source code on GitHub&lt;/a&gt;&lt;/p&gt;




</description>
      <category>javascript</category>
      <category>webdev</category>
      <category>imageprocessing</category>
      <category>canvas</category>
    </item>
    <item>
      <title>The Architecture Behind a Stateless AI Application</title>
      <dc:creator>Nicanor Korir</dc:creator>
      <pubDate>Mon, 01 Dec 2025 23:39:47 +0000</pubDate>
      <link>https://forem.com/nicanor_korir/the-architecture-behind-a-stateless-ai-application-pk3</link>
      <guid>https://forem.com/nicanor_korir/the-architecture-behind-a-stateless-ai-application-pk3</guid>
      <description>&lt;p&gt;This project has really been awesome to work on. I made an architectural decision early in Shamba-MedCare that felt risky at the time: no backend database. There was no need for user data at this moment, and getting the user response was the most important.&lt;/p&gt;

&lt;p&gt;Every tutorial, every architecture guide, every "best practice" document assumes you'll store user data on a server. User accounts, session management, and data persistence all living in PostgreSQL or MongoDB, or DynamoDB.&lt;/p&gt;

&lt;p&gt;But I kept asking myself: why? What user data does this application actually need to persist across devices? The answer was... nothing. And that realization shaped everything that followed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three-Layer Split
&lt;/h2&gt;

&lt;p&gt;Here's how the system actually works:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fio86mn2caiod0j3y46k0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fio86mn2caiod0j3y46k0.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frontend&lt;/strong&gt; handles all user interaction, data persistence, and UI state. It compresses images before upload, manages the multi-step wizard flow, stores history locally, and renders results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backend&lt;/strong&gt; does exactly one thing: transform image data into diagnosis data. It receives a request, builds a prompt, calls LLM, parses the response, and returns structured JSON. No state. No sessions. No database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI Layer&lt;/strong&gt; is Claude Vision. It receives images with carefully crafted prompts and returns detailed diagnostic information.&lt;/p&gt;

&lt;p&gt;Each layer has one job. Mixing responsibilities, like having the backend store history or the frontend call LLM directly, would create complexity without benefit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Interaction
&lt;/h2&gt;

&lt;p&gt;Simplicity is the goal, here is how it works&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4iwjtkfohyo84i6l2fr8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4iwjtkfohyo84i6l2fr8.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;History and setting never leave the user's device. The API key passes through my server, but is never stored.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Multi-Step Wizard: Why State Machines
&lt;/h2&gt;

&lt;p&gt;The scan flow has five potential steps: plant part selection, crop type selection, media upload, analysis mode (for multiple images), and context entry. Implementing this as a traditional form with step numbers would be a nightmare.&lt;/p&gt;

&lt;p&gt;Here's the problem with step numbers:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7u43q84mexyqqdivwmvg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7u43q84mexyqqdivwmvg.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the user uploads a single image, step 4 is context entry. If they upload multiple images, step 4 is mode selection and step 5 is context entry. The step numbers become meaningless because they depend on runtime conditions.&lt;/p&gt;

&lt;p&gt;The solution is a state machine. Each state has a meaningful name: "part", "crop", "media", "mode", "context", "analyzing". The UI doesn't care about step numbers. It renders whatever state it's in.&lt;/p&gt;

&lt;p&gt;The progress indicator ("Step 3 of 5" vs "Step 3 of 4") is computed dynamically based on whether mode selection will appear. Users see accurate progress without the code caring about arbitrary step numbers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage Architecture: Three Tiers
&lt;/h2&gt;

&lt;p&gt;Different data have different lifetimes. I implemented three distinct tiers:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdgb9vqj3ygkpsq3dyukh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdgb9vqj3ygkpsq3dyukh.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session storage&lt;/strong&gt; holds consent flags. When you close the browser, consent expires. Next session, you make an active choice again. For health-related applications, I think users should consciously opt in each time rather than relying on consent from months ago.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local storage&lt;/strong&gt; holds everything that should persist: scan history, accessibility settings (font size, voice preferences), and the API key.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embedded deep cache&lt;/strong&gt; is a design choice that trips people up. Each history item doesn't just store a reference to results, it stores the complete diagnosis. All 25+ fields. Treatments, prevention tips, the full thing.&lt;/p&gt;

&lt;p&gt;This bloats storage but enables true offline access. A farmer can reference last week's treatment recommendations without any network connection. That's critical for rural users.&lt;/p&gt;

&lt;p&gt;The math works out: each scan is about 20-30KB with a thumbnail. At 50 scans maximum, that's roughly 1.5MB, well under the 5MB browser quota. Older scans rotate out automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Single Endpoint Philosophy
&lt;/h2&gt;

&lt;p&gt;The backend has one API endpoint: POST to &lt;code&gt;/api/v1/analyze&lt;/code&gt;. That's it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwreaiufhnoheb49kafxh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwreaiufhnoheb49kafxh.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Why not multiple endpoints? I considered having &lt;code&gt;/analyze/single&lt;/code&gt; for one plant, &lt;code&gt;/analyze/batch&lt;/code&gt; for multiple plants, and &lt;code&gt;/analyze/video&lt;/code&gt; for video input. But here's the thing: they all do the same underlying operation. They all send images to LLM and return structured results.&lt;/p&gt;

&lt;p&gt;The only difference is in the prompt construction and response handling. A &lt;code&gt;mode&lt;/code&gt; parameter handles that cleanly. Multiple endpoints would mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Client-side routing logic&lt;/li&gt;
&lt;li&gt;Duplicate validation code&lt;/li&gt;
&lt;li&gt;Versioning complexity when the format changes&lt;/li&gt;
&lt;li&gt;Documentation for three endpoints instead of one&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One endpoint with a mode parameter is simpler to understand, test, and maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  User-Provided API Key
&lt;/h2&gt;

&lt;p&gt;This is, of course, temporary, and since the app is still in testing and development stages, we need to prevent overuse of our API Credits&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkan9heo46az26be490c2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkan9heo46az26be490c2.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost transparency&lt;/strong&gt;: Users see exactly what they're paying. No hidden markup. No surprise bills.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No key management&lt;/strong&gt;: I don't need a database to store keys, rotation logic, or access controls, this reduces operational complexity by a lot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: No shared rate limits. Each user has their own Anthropic quota.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trust&lt;/strong&gt;: Users control their own credentials. I literally cannot run up their bill unexpectedly.&lt;/p&gt;

&lt;p&gt;The downside is friction. Users must create an Anthropic account and generate an API key before using the app. For sophisticated users (the current audience), this is fine. For mass-market adoption, I'd need to revisit this decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Batch vs Single Mode: The Same Plant Problem
&lt;/h2&gt;

&lt;p&gt;When users upload multiple images, the system needs to know: are these different plants (analyze separately) or the same plant from different angles (analyze together)?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsricafnn08tksvdmtx46.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsricafnn08tksvdmtx46.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This isn't something AI can reliably infer. A tomato leaf photographed from above and below might look completely different. Are they the same plant? Only the user knows.&lt;/p&gt;

&lt;p&gt;So the UI asks directly. "Same plant, different angles" sends all images in one request, LLM sees everything together and produces one diagnosis that synthesizes evidence across views. "Different plants" sends each image separately, three images means three independent diagnoses.&lt;/p&gt;

&lt;p&gt;For video input, extracted frames default to "same plant" mode. The user was recording continuously, so frames presumably show the same subject.&lt;/p&gt;

&lt;h2&gt;
  
  
  Response Parsing: Handling Imperfection
&lt;/h2&gt;

&lt;p&gt;Here's something I learned the hard way: LLM sometimes wraps JSON responses in markdown code blocks. Even when the prompt explicitly requests raw JSON.&lt;/p&gt;

&lt;p&gt;The prompt says: "Return ONLY valid JSON, no markdown formatting."&lt;/p&gt;

&lt;p&gt;LLM occasionally returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
json&lt;br&gt;
{"health_score": 45, "disease": "Early Blight"...}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The backend strips markdown code block markers if present. It handles both raw JSON and markdown-wrapped JSON identically.&lt;/p&gt;

&lt;p&gt;The broader principle: &lt;strong&gt;prompts aim for perfection; parsing assumes imperfection&lt;/strong&gt;. Every field has a fallback. Missing health score defaults to 0. Missing severity defaults to "moderate". A partial response is better than a crashed request.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trade-off Table
&lt;/h2&gt;

&lt;p&gt;I believe architecture is really just trade-off documentation. Here's the honest accounting:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;th&gt;What We Gave Up&lt;/th&gt;
&lt;th&gt;What We Gained&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No backend database&lt;/td&gt;
&lt;td&gt;Cross-device sync, unlimited history&lt;/td&gt;
&lt;td&gt;Privacy by design, simpler operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User-provided API keys&lt;/td&gt;
&lt;td&gt;Frictionless onboarding&lt;/td&gt;
&lt;td&gt;Cost transparency, no key management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optional plant/crop selection&lt;/td&gt;
&lt;td&gt;Guaranteed input accuracy&lt;/td&gt;
&lt;td&gt;Accessibility, faster expert workflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full results cached in history&lt;/td&gt;
&lt;td&gt;Smaller storage footprint&lt;/td&gt;
&lt;td&gt;True offline access to treatments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single API endpoint&lt;/td&gt;
&lt;td&gt;Clear operation separation&lt;/td&gt;
&lt;td&gt;Simpler integration, less client logic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;None of these decisions is universally correct. They're correct for this application's specific constraints: privacy-conscious users, offline usage requirements, cost sensitivity, and a small development team.&lt;/p&gt;

&lt;p&gt;Different constraints would lead to different choices. A hospital app would need a database. An enterprise tool would need centralized key management. A children's app would need user accounts.&lt;/p&gt;

&lt;p&gt;Architecture isn't about finding the "right" pattern. It's about understanding your constraints and making explicit choices that fit them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;The system that emerged is simple in a specific way. Not simple as in easy to build, simple as in each piece does one thing.&lt;/p&gt;

&lt;p&gt;Frontend handles users. Backend handles transformation. LLM handles intelligence. Storage tiers handle different lifetimes. The scan wizard handles a multi-step flow. Each piece is testable in isolation and replaceable without affecting others.&lt;/p&gt;

&lt;p&gt;That's the goal of architecture: not clever abstractions, but clear separations. Not perfect patterns, but honest trade-offs.&lt;/p&gt;

&lt;p&gt;When I look at this system, I can explain every decision. That's what makes it maintainable, not the code, but the clarity of intent behind it.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://github.com/nicanor/shamba-dawa" rel="noopener noreferrer"&gt;Source code on GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>opensource</category>
      <category>computerscience</category>
    </item>
    <item>
      <title>Building Shamba-MedCare AI app for Real Users</title>
      <dc:creator>Nicanor Korir</dc:creator>
      <pubDate>Mon, 01 Dec 2025 22:57:52 +0000</pubDate>
      <link>https://forem.com/nicanor_korir/building-shamba-medcare-ai-app-for-real-users-dak</link>
      <guid>https://forem.com/nicanor_korir/building-shamba-medcare-ai-app-for-real-users-dak</guid>
      <description>&lt;p&gt;From the research, farmers are interested in the results.&lt;/p&gt;

&lt;p&gt;I spent about two days perfecting the health score animation. A smooth circular progress bar that fills with color, green for healthy, red for critical.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Users I'm Actually Building For
&lt;/h2&gt;

&lt;p&gt;Here's the uncomfortable truth about my target users:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpzrnfxb273jrl29yw8e3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpzrnfxb273jrl29yw8e3.png" alt=" " width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This isn't edge-case accessibility. This IS the use case. A 55-year-old farmer with reading glasses she can't find, soil under her fingernails, standing in bright sunlight with one bar of signal, that's who needs this app most.&lt;/p&gt;

&lt;p&gt;So I built the accessibility system around her, not around developers reviewing my code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Voice Button That Changed Everything
&lt;/h2&gt;

&lt;p&gt;The single most impactful feature I built was embarrassingly simple: a button that reads the diagnosis/results aloud&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnynky6rbzwo1vtti9if.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnynky6rbzwo1vtti9if.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I used the Web Speech API, which is built into every modern browser. The implementation took maybe an hour. But here's what I learned: the voice doesn't just help users who can't read. It helps everyone.&lt;/p&gt;

&lt;p&gt;My mom tested it. She CAN read perfectly well. But she said hearing "Your tomato has early blight. This is a fungal disease. Severity is moderate" felt more trustworthy than reading the same words. Like getting advice from a person instead of a screen.&lt;/p&gt;

&lt;p&gt;The voice script matters too. I don't just dump the JSON response into speech. I wrote it conversationally:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Your tomato leaf has a health score of 45 out of 100. I detected Early Blight, which is a fungal disease. The severity is moderate. For treatment, you can use neem oil spray, mix two tablespoons with one liter of water, and spray every seven days."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Pauses between sections. Specific measurements. No jargon. The AI generates this because I asked it to in the prompt, "provide practical, actionable treatment steps that farmers can follow."&lt;/p&gt;

&lt;h2&gt;
  
  
  Font Scaling Without Breaking Everything
&lt;/h2&gt;

&lt;p&gt;The accessibility settings panel lets users choose font sizes: Normal, Large, or Extra Large. Sounds trivial. The implementation taught me something about CSS architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvwqy361bnm7t7ygznpi3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvwqy361bnm7t7ygznpi3.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I changed the font size on the document root element. Every component using &lt;code&gt;rem&lt;/code&gt; units automatically scales. No prop drilling. No context providers. One line of JavaScript, and the entire app responds.&lt;/p&gt;

&lt;p&gt;The trick is building components with &lt;code&gt;rem&lt;/code&gt; from day one. If you've hardcoded pixel values everywhere, retrofitting accessibility becomes a rewrite. I got lucky, Tailwind's default classes use &lt;code&gt;rem&lt;/code&gt;, so most of my UI scaled correctly without changes.&lt;/p&gt;

&lt;p&gt;The settings persist in localStorage. A farmer who needs large fonts shouldn't re-enable them every session. That would be the kind of "technically accessible" that's practically useless.&lt;/p&gt;

&lt;h2&gt;
  
  
  Touch Targets for Rough Hands
&lt;/h2&gt;

&lt;p&gt;Apple says 44px minimum for touch targets. I went bigger.&lt;/p&gt;

&lt;p&gt;Here's why: I watched my uncle try to use a banking app. His fingers are thick from decades of farm work. He kept hitting the wrong buttons. Not because he's clumsy, but because the app was designed by someone with smooth developer hands typing on a MacBook.&lt;/p&gt;

&lt;p&gt;My button sizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standard buttons: 48px minimum height&lt;/li&gt;
&lt;li&gt;Primary actions (like "Analyze"): 56px minimum&lt;/li&gt;
&lt;li&gt;Bottom navigation: 64px minimum&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bottom navigation placement matters too. Thumbs naturally rest at the bottom of the phone. Putting the main navigation there means one-handed use actually works. Web convention says nav goes at the top. Mobile ergonomics says that's wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Color That Works Without Reading
&lt;/h2&gt;

&lt;p&gt;The health score uses a five-color severity system:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnf9h020l9yu5gwt6gyl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnf9h020l9yu5gwt6gyl.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A user can glance at the color and know how worried to be. No reading required. The traffic light metaphor is universal, green means go, red means stop.&lt;/p&gt;

&lt;p&gt;I chose these specific colors for three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Colorblind-safe progression&lt;/strong&gt;: The luminance (brightness) decreases from green to red, so even without color perception, the severity reads correctly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sunlight visibility&lt;/strong&gt;: High saturation colors remain distinguishable in bright outdoor light. Pastels wash out.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cultural familiarity&lt;/strong&gt;: Traffic lights exist everywhere. The metaphor translates.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Progressive Disclosure
&lt;/h2&gt;

&lt;p&gt;A full diagnosis from Claude contains 25+ fields. Disease name, scientific name, confidence score, symptoms, causes, spread risk, urgency, treatments with ingredients and preparation steps, prevention tips, regional availability notes...&lt;/p&gt;

&lt;p&gt;Showing all of that at once would overwhelm anyone, let alone someone who struggles with text-heavy interfaces.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2g3b1uuafvh29u41560.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2g3b1uuafvh29u41560.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Settings That Actually Exist
&lt;/h2&gt;

&lt;p&gt;I built an accessibility settings panel with five toggles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Font Size&lt;/strong&gt;: Normal / Large / Extra Large&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High Contrast&lt;/strong&gt;: On / Off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduced Motion&lt;/strong&gt;: On / Off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice Mode&lt;/strong&gt;: Off / On Request / Always On&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple Mode&lt;/strong&gt;: On / Off&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These settings persist in localStorage and apply instantly. The high contrast and reduced motion modes add CSS classes to the document root, the same pattern as font scaling.&lt;/p&gt;

&lt;p&gt;"Simple Mode" hides secondary information and removes visual flourishes. For users who find options themselves overwhelming, fewer choices mean clearer choices.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Built vs. the future
&lt;/h2&gt;

&lt;p&gt;I call it the future since some of these features can still be added&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's working:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Voice read-aloud for results (Web Speech API)&lt;/li&gt;
&lt;li&gt;Font size scaling (root-level CSS)&lt;/li&gt;
&lt;li&gt;Large touch targets (Tailwind minimums)&lt;/li&gt;
&lt;li&gt;Color-coded severity (5-level system)&lt;/li&gt;
&lt;li&gt;Progressive disclosure (collapsible sections)&lt;/li&gt;
&lt;li&gt;Settings persistence (localStorage)&lt;/li&gt;
&lt;li&gt;Bottom navigation (thumb-friendly placement)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What's partially done:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High contrast mode (setting exists, CSS needs work)&lt;/li&gt;
&lt;li&gt;Reduced motion (setting exists, animations still run)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What's not built yet:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Voice INPUT (speech-to-text for context entry)&lt;/li&gt;
&lt;li&gt;Multi-language support (Swahili, German, etc.)&lt;/li&gt;
&lt;li&gt;Offline mode for the actual analysis&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a href="https://github.com/nicanor/shamba-dawa" rel="noopener noreferrer"&gt;Source code on GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>opensource</category>
      <category>computerscience</category>
    </item>
    <item>
      <title>Shamba-MedCare Prompt Engineering</title>
      <dc:creator>Nicanor Korir</dc:creator>
      <pubDate>Mon, 01 Dec 2025 21:30:44 +0000</pubDate>
      <link>https://forem.com/nicanor_korir/shamba-medcare-prompt-engineering-5f9n</link>
      <guid>https://forem.com/nicanor_korir/shamba-medcare-prompt-engineering-5f9n</guid>
      <description>&lt;p&gt;Some background context:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am building a simple plant disease diagnosis solution using AI, inspired by my farming background and advancements in intelligent technological tools&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can check out the &lt;a href="http://shamba-medcare.vercel.app/" rel="noopener noreferrer"&gt;Shamba-MedCare App here&lt;/a&gt;. Sorry for testing, you'll have to use your own api keys until the public launch is available. The keys are stored in  the browser's local storage, so they are private&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdn4gszzoyu0qn7jngevv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdn4gszzoyu0qn7jngevv.png" alt=" " width="800" height="1104"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For context here, whenever you read LLM(Large Language Model), I mostly Claude. I like to use LLM since it's generic, and this solution can be fitted to any LLM.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I played around with several prompts in order to nail the best results. This is how I transformed my prompt engineering journey with Shamba-MedCare:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0jao6jelvpcoku09ueqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0jao6jelvpcoku09ueqr.png" alt=" " width="800" height="624"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My first prompt to LLM Vision was embarrassingly naive:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"What disease does this plant have?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The response was a 2,000-word essay about plant pathology in general. Helpful for a textbook. Useless for a farmer with a dying tomato plant. Getting AI to return &lt;strong&gt;structured, actionable, budget-aware diagnoses&lt;/strong&gt; took iteration. Here's what I learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4snwsw6mlb9e49xjw60g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4snwsw6mlb9e49xjw60g.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two prompts matter: the &lt;strong&gt;system prompt&lt;/strong&gt; (who LLM(e.g. claude pretends to be) and the &lt;strong&gt;analysis prompt&lt;/strong&gt; (what to do with this specific image).&lt;/p&gt;

&lt;h2&gt;
  
  
  System Prompt: Creating "Shamba"
&lt;/h2&gt;

&lt;p&gt;Prompts work better with a persona. I created &lt;strong&gt;Shamba persona&lt;/strong&gt;, an agricultural pathologist who:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are Shamba, an expert agricultural pathologist. You analyze
plant images to identify diseases, pests, and nutrient deficiencies.

Your expertise includes:
- 50+ crop types worldwide
- Fungal, bacterial, viral, and physiological disorders
- Traditional and modern treatment methods
- Practical advice for resource-limited farmers

Guidelines:
1. Always include at least one FREE/traditional treatment
2. Describe WHERE symptoms appear (for visual mapping)
3. Be honest about uncertainty—use confidence scores
4. Recommend professional help for severe cases
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key line: &lt;strong&gt;"Always include at least one FREE/traditional treatment."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without that explicit instruction, the LLM defaulted to commercial products. Helpful for a suburban gardener. Useless for a farmer who can't afford a $15 fungicide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure #1: The JSON Nightmare
&lt;/h2&gt;

&lt;p&gt;My first structured attempt asked LLM to return JSON, which it did pretty well. Wrapped in markdown code fences. With helpful commentary before and after.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Here's my analysis:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
json&lt;br&gt;
{ "disease": "Early Blight" }&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
This is a common fungal disease...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My parser choked. The fix was explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Return ONLY a valid JSON object. No markdown, no commentary,
no text before or after. Start with { and end with }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Still failed 10% of the time. So I added backend parsing that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Strips markdown fences if present&lt;/li&gt;
&lt;li&gt;Extracts JSON from surrounding text&lt;/li&gt;
&lt;li&gt;Validates against the expected schema&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Failure #2: Location Descriptions
&lt;/h2&gt;

&lt;p&gt;For the visual heatmap feature, I needed LLM to describe WHERE damage appeared. My prompt asked for "affected regions."&lt;/p&gt;

&lt;p&gt;LLM returned: "The affected area is significant." This was not helpful, I needed the exact coordinates, and I tried out several solutions. This was close to perfect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Describe affected regions with:
- Location(helpful for heatmaps): top-left, center, lower-right, edges, margins
- Coverage: percentage of area affected (e.g., "35%")
- Spread direction: "Moving from lower leaves upward."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now LLM returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"affected_regions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lower-left"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"severe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Dark brown lesions with concentric rings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"coverage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"center"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"moderate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"coverage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's enough to generate a heatmap overlay for now&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqc89g3pyhnmm7bs75cj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqc89g3pyhnmm7bs75cj.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure #3: Treatment Cost Blindness
&lt;/h2&gt;

&lt;p&gt;Early on, treatments came out randomly ordered. Sometimes the $50 systemic fungicide appeared first. Sometimes, the free wood ash remedy.&lt;/p&gt;

&lt;p&gt;The problem: LLM has no inherent understanding of budget constraints. I had to structure it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Provide treatments in EXACTLY this order:
1. FREE TIER: Traditional/home remedies ($0)
2. LOW COST: Basic solutions ($1-5)
3. MEDIUM COST: Commercial organic ($5-20)
4. HIGH COST: Synthetic/professional ($20+)

Each tier must have at least one option if applicable.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response schema enforced this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"treatments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Wood ash paste"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"cost_tier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"free"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"estimated_cost"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"ingredients"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Wood ash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Water"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"application"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Apply directly to affected areas"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"availability"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Common from cooking fires"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Neem oil spray"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"cost_tier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"low"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"estimated_cost"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$1-3"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Plant Part-Specific Prompt Strategy
&lt;/h2&gt;

&lt;p&gt;Different plant parts reveal different problems, and I needed the right prompt to get the right problem with the best remedies. My prompt adapts:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fys5hvnecnd073rv1j4t4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fys5hvnecnd073rv1j4t4.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For leaves:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Examine: color patterns, spot shapes, curling, holes, coating
Common issues: fungal spots, viral mosaic, nutrient chlorosis, pest damage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For roots:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Examine: color (white=healthy, brown/black=rot), texture, galls, structure
Common issues: root rot, nematode damage, waterlogging
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This focus improves accuracy dramatically. Asking LLM to look for "anything wrong" produces vague results. Asking it to specifically check for concentric ring patterns in leaf spots? Now we're diagnosing Early Blight.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Final Prompt Structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[SYSTEM PROMPT]
You are Shamba, an agricultural pathologist...

[ANALYSIS PROMPT]
Analyze this {plant_part} image from a {crop_type} plant.

User's context: {additional_context}

Provide:
1. Image validation (correct plant part? good quality?)
2. Health score (0-100)
3. Disease identification with confidence (0.0-1.0)
4. Affected region locations for visual mapping
5. Treatments by cost tier (FREE mandatory)
6. Prevention tips

Return as JSON following this schema:
{response_schema}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Start with the output format first.&lt;/strong&gt; I designed prompts around what I wanted LLM to do. I should have designed around what the farmer needed to see.&lt;/p&gt;

&lt;p&gt;The heatmap feature was an afterthought. If I'd planned for it from day one, the location description format would have been baked in, not retrofitted. This is actually a useful feature for farmers if you can imagine an affected plant, there are heatmaps on the heavily affected areas&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test with bad photos early.&lt;/strong&gt; My development photos were well-lit, centered, single-issue plants. Real farmer photos are blurry, shadowy, and show three problems at once. The robustness I needed only emerged after testing with garbage inputs.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://github.com/nicanor/shamba-dawa" rel="noopener noreferrer"&gt;Source code on GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>promptengineering</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Why I Built Shamba-MedCare (And What I Learned About Solving Real Problems)</title>
      <dc:creator>Nicanor Korir</dc:creator>
      <pubDate>Mon, 01 Dec 2025 20:50:41 +0000</pubDate>
      <link>https://forem.com/nicanor_korir/why-i-built-shamba-medcare-and-what-i-learned-about-solving-real-problems-425g</link>
      <guid>https://forem.com/nicanor_korir/why-i-built-shamba-medcare-and-what-i-learned-about-solving-real-problems-425g</guid>
      <description>&lt;p&gt;I grew up around farms in the Kenya highlands region. Of course, I am a farm boy 😂, and I watched farmers lose entire harvests because they couldn't identify a disease until it was too late. By the time they reached an expert, the damage was done.&lt;/p&gt;

&lt;p&gt;Most plant disease apps scan leaves and miss the essential parts of the plant, e.g., the branch, roots, and entire leaf area. This is what I can think of current solutions(With the least research I've done, of course)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe31y59ibf2ddys5qrx07.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe31y59ibf2ddys5qrx07.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Root rot starts underground, Stem borers tunnel through stalks, bark cankers spread silently. By the time symptoms reach the leaves, the farmer is already losing the war.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;Shamba-MedCare&lt;/strong&gt;—"Shamba" (farm) + "Dawa" (medicine) in Swahili, a simple solution focusing on helping farmers, scientists, etc. Checkout here &lt;a href="http://shamba-medcare.vercel.app/" rel="noopener noreferrer"&gt;Shamba-MedCare App&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgfsa0ldm7gvz0pqe3ud7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgfsa0ldm7gvz0pqe3ud7.png" alt="Shmaba MedCare" width="800" height="1063"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Approaches I Considered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Train a Custom CNN
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://www.kaggle.com/datasets/emmarex/plantdisease" rel="noopener noreferrer"&gt;PlantVillage dataset&lt;/a&gt; has 50,000+ labeled images, and MobileNetV3-small can hit 99.5% accuracy at just 1MB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The catch?&lt;/strong&gt; The images must have perfect lighting and clean backgrounds. My accuracy tanked the moment I tested with real field photos—muddy roots, partial shadows, multiple issues on one plant.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwo3cruzvkn7ue2zeou7l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwo3cruzvkn7ue2zeou7l.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: Use a Pre-built API (Plantix, PlantVillage Nuru)
&lt;/h3&gt;

&lt;p&gt;These are some of the existing solutions that work on different use cases. They give a classification with a confidence score, Classification alone doesn't save crops.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: Multimodal LLM (Claude Vision)
&lt;/h3&gt;

&lt;p&gt;This is where things got interesting.&lt;/p&gt;

&lt;p&gt;Claude doesn't just classify, it &lt;strong&gt;reasons&lt;/strong&gt;. I can ask it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Analyze this tomato leaf. The farmer says spots appeared 2 weeks ago after heavy rain. They can only afford traditional remedies. What's wrong and what should they do?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And it actually incorporates that context.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkt2fdjxokli2yr52seqo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkt2fdjxokli2yr52seqo.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trade-off I Made
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Custom CNN&lt;/th&gt;
&lt;th&gt;Claude Vision&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Works Offline&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contextual Explanations&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Novel Disease Handling&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per-Request Cost&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;~$0.01-0.05&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I chose Claude, not because it's perfect, but because &lt;strong&gt;a detailed explanation that is helpful to the farmer and saves a crop is worth more than a fast classification that misses context&lt;/strong&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;The core flow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl01i77hxvoyx4m0j4nu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl01i77hxvoyx4m0j4nu.png" alt=" " width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every diagnosis includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Health score (0-100)&lt;/li&gt;
&lt;li&gt;Disease identification with confidence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual heatmap&lt;/strong&gt; showing WHERE damage is&lt;/li&gt;
&lt;li&gt;Treatment tiers: FREE → Low → Medium → High cost&lt;/li&gt;
&lt;li&gt;Traditional remedies that farmers already trust&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Inclusion
&lt;/h2&gt;

&lt;p&gt;I almost built this for all the farmers to be generic. Then I remembered, a 55-year-old farmer in rural Kenya with basic literacy is not the same as a 25-year-old agronomist with a smartphone addiction.&lt;/p&gt;

&lt;p&gt;So I added:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Voice mode&lt;/strong&gt;: Results read aloud in clear speech while the user is scrolling through&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Huge touch targets&lt;/strong&gt;: 44px minimum, because field work means rough hands and busy farmers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bottom navigation&lt;/strong&gt;: One-handed operation while holding a plant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Icon-first design&lt;/strong&gt;: Pictures over text, and pictures with text&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tech that farmers can't use is just tech for my portfolio&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Next in this series&lt;/strong&gt;: How I structured the prompts to get consistent, budget-aware diagnoses (and damn, the 3 times it completely failed)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/nicque-cpu/shamba-dawa" rel="noopener noreferrer"&gt;Source code on GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>opensource</category>
      <category>computerscience</category>
    </item>
    <item>
      <title>Optical Flow: How Robots (and maybe your Phone) See Motion</title>
      <dc:creator>Nicanor Korir</dc:creator>
      <pubDate>Fri, 14 Nov 2025 14:39:13 +0000</pubDate>
      <link>https://forem.com/nicanor_korir/optical-flow-how-robots-and-maybe-your-phone-see-motion-19ca</link>
      <guid>https://forem.com/nicanor_korir/optical-flow-how-robots-and-maybe-your-phone-see-motion-19ca</guid>
      <description>&lt;p&gt;Okay, so here's a weird question: how do you know something is moving?&lt;/p&gt;

&lt;p&gt;Like, right now, if I threw a ball at you, you'd catch it or try to. Not because you're doing complex calculations. You just &lt;em&gt;see&lt;/em&gt; it moving. Your brain processes the motion instantly, and your hands know where to be.&lt;/p&gt;

&lt;p&gt;But how? What's actually happening when you perceive motion?&lt;/p&gt;

&lt;p&gt;That's &lt;strong&gt;optical flow&lt;/strong&gt;. And honestly? Understanding optical flow changed how I think about vision in general. Let me explain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Coffee Cup Experiment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine you're sitting at a table with a coffee cup in front of you. The cup isn't moving. You're not moving. Everything is still.&lt;/p&gt;

&lt;p&gt;Now, I walk past you. As I walk, from your perspective, the background behind me seems to shift. The wall behind me appears to move in the opposite direction I'm walking. The floor seems to slide past.&lt;/p&gt;

&lt;p&gt;But here's the thing, nothing is actually moving except me. The wall isn't really moving. The floor isn't sliding. Your brain knows this because it's processing &lt;em&gt;relative motion&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;What you're actually seeing is, the &lt;em&gt;pixels in your visual field are changing position over time&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Optical flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer (you, a camera, a robot, etc) and the scene.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optical flow is basically asking, "Which pixels are moving, and in which direction?"&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is Optical Flow So Important?
&lt;/h3&gt;

&lt;p&gt;Here's where it gets practical. Imagine a robot navigating through a hallway. How does it know it's moving forward? &lt;/p&gt;

&lt;p&gt;One way: it has odometers on its wheels, or it uses GPS, or it has a motion sensor. But what if those sensors break? Or what if it's in an environment where GPS doesn't work?&lt;/p&gt;

&lt;p&gt;Another way: the robot looks at what it sees and figures out, "Hey, everything in my visual field is moving away from the center. That means I'm moving forward." This is optical flow.&lt;/p&gt;

&lt;p&gt;If a robot is trying to catch a moving object, it needs to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the object moving, or am I moving?&lt;/li&gt;
&lt;li&gt;In which direction is it moving?&lt;/li&gt;
&lt;li&gt;How fast?&lt;/li&gt;
&lt;li&gt;Will it hit something?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this can be extracted from optical flow.&lt;/p&gt;

&lt;p&gt;Similarly, when your phone stabilizes video, it's using optical flow to detect camera shake and compensate for it. When a drone hovers in place without GPS, it's using optical flow to stay put.&lt;/p&gt;

&lt;h3&gt;
  
  
  But: What's Actually Happening?
&lt;/h3&gt;

&lt;p&gt;Let's go back to basics, you're looking at a video, let's say it's a video of a person walking across a room.&lt;/p&gt;

&lt;p&gt;Frame 1: You see the person at position X.&lt;br&gt;
Frame 2: You see the person at position X+5 pixels to the right.&lt;br&gt;
Frame 3: You see the person at position X+10 pixels to the right.&lt;/p&gt;

&lt;p&gt;Optical flow is literally: "The person moved 5 pixels to the right between frame 1 and 2, and another 5 pixels between frame 2 and 3."&lt;/p&gt;

&lt;p&gt;But it's not just about &lt;em&gt;where&lt;/em&gt; things moved. It's about the &lt;em&gt;pattern&lt;/em&gt; of motion across the entire image.&lt;/p&gt;

&lt;p&gt;Think of it like this: imagine you're looking at a piece of paper with arrows drawn on it. Each arrow points in a direction, and its length shows how far something moved. &lt;/p&gt;

&lt;p&gt;In a video of a person walking toward the camera:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The edges of the image show motion outward (things moving away)&lt;/li&gt;
&lt;li&gt;The center shows less motion&lt;/li&gt;
&lt;li&gt;The person's limbs show rapid motion (arms swinging)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you visualize all these arrows together, you get a &lt;em&gt;motion field&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Okay, But How Do You Actually Calculate It?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Basic Principle:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A pixel's brightness (or color) doesn't change much between consecutive frames, unless something moves.&lt;/p&gt;

&lt;p&gt;So if you see a pixel that was bright white in frame 1, and it's also bright white in frame 2, but a few pixels to the right, you can infer: "That pixel content moved to the right."&lt;/p&gt;

&lt;p&gt;This is called the &lt;strong&gt;brightness constancy assumption&lt;/strong&gt;: the intensity of a pixel remains constant as it moves.&lt;/p&gt;

&lt;p&gt;In math terms:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I(x, y, t) = I(x + dx, y + dy, t + dt)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This just means: "The brightness at position (x, y) at time t equals the brightness at the new position after movement."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Lucas-Kanade Method (One Popular Approach)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There are many ways to calculate optical flow, one of the most famous is Lucas-Kanade. Here is how it works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Look at a small window of pixels (like a 3x3 or 5x5 grid)&lt;/li&gt;
&lt;li&gt;Find the best &lt;em&gt;motion vector&lt;/em&gt; (how far it moved, in which direction) that explains the change between frames&lt;/li&gt;
&lt;li&gt;Repeat for every pixel in the image&lt;/li&gt;
&lt;li&gt;You get a motion field, every pixel has an associated motion vector&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's like saying: "For this window, the best explanation for the change I see is that everything shifted 3 pixels to the right and 1 pixel down."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Dense vs. Sparse Problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sparse Optical Flow&lt;/strong&gt;: Track only a few key points (like corners or features). You end up with arrows pointing from frame 1 to frame 2 for a few hundred points.&lt;/p&gt;

&lt;p&gt;Advantage: Fast, works even with significant motion.&lt;br&gt;
Disadvantage: Doesn't tell you about the entire scene, just key points.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dense Optical Flow&lt;/strong&gt;: Calculate motion for &lt;em&gt;every&lt;/em&gt; pixel, every single pixel gets a motion vector.&lt;/p&gt;

&lt;p&gt;Advantage: Complete picture of motion.&lt;br&gt;
Disadvantage: Computationally expensive, can fail with large motion or occlusions.&lt;/p&gt;

&lt;p&gt;For a robot navigating a hallway? Sparse is usually enough. You just need to know the general motion pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A Real Example: Following a Ball&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's say you're building a robot that needs to track a tennis ball.&lt;/p&gt;

&lt;p&gt;Frame 1: The ball is at position (100, 150) in the image.&lt;br&gt;
Frame 2: The ball is at position (115, 148) in the image.&lt;/p&gt;

&lt;p&gt;Optical flow detected: The ball moved 15 pixels right, 2 pixels up.&lt;/p&gt;

&lt;p&gt;Frame 3: The ball is at position (130, 145).&lt;/p&gt;

&lt;p&gt;Optical flow detected: The ball moved 15 pixels right, 3 pixels up.&lt;/p&gt;

&lt;p&gt;Now the robot can predict: "The ball is moving consistently to the right and slightly upward. At this rate, in the next frame it will be around (145, 142)."&lt;/p&gt;

&lt;p&gt;Extrapolate further, and the robot can predict where the ball will be and position itself to catch it. Optical flow is the vision equivalent of prediction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Challenges: When Optical Flow Fails&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge 1: Occlusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine someone walks behind a tree. From the camera's perspective, the person disappears. Optical flow can't track what it can't see. The motion vectors suddenly stop.&lt;/p&gt;

&lt;p&gt;Robots have to be smart about this: "The person disappeared, but based on the last known motion vector, I predict they'll emerge here."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge 2: Lighting Changes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Remember the brightness constancy assumption? It breaks if the lighting changes. If a cloud passes overhead and the entire scene gets darker, optical flow gets confused.&lt;/p&gt;

&lt;p&gt;It might think things moved when really just the lighting changed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge 3: Large Motion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If something moves really fast between frames, optical flow struggles. It expects motion to be small and smooth, think of fast action footage. Optical flow can't always keep up with rapid motion. This is why video codecs that use optical flow sometimes struggle with fast cuts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge 4: Texture-less Regions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're looking at a blank wall, there are no features to track. Optical flow can't tell if the wall moved or not because there's nothing distinctive to latch onto.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge 5: Reflections and Transparency&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mirrors, windows, water, these break optical flow because the brightness doesn't correlate with actual motion.&lt;/p&gt;




&lt;h3&gt;
  
  
  Uses of Optical Flow
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Autonomous Vehicles&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Self-driving cars use optical flow to understand their motion relative to the scene. "The lane markings are flowing backward, which means I'm moving forward." It's also used to detect obstacles e.g. "That region isn't flowing like the background—something is there."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Video Compression&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When Netflix streams a video to you, it doesn't send every pixel every frame. It uses optical flow to predict motion: "In the next frame, these pixels will probably be here based on the motion I detected." Then it only sends the changes.&lt;/p&gt;

&lt;p&gt;This saves massive amounts of bandwidth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Video Stabilization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your phone camera detects motion between frames using optical flow. If it detects motion that seems like camera shake (small, jittery motion), it digitally shifts the image to compensate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Robotics Navigation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mobile robots use optical flow to navigate when other sensors fail. "I can see the environment is flowing past me, so I know I'm moving forward. If the flow pattern changes, something is blocking me."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Action Recognition&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're building a system that understands "What is happening in this video?", optical flow helps. Running looks different from walking looks different from falling, and these differences show up in the motion patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Frame Interpolation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ever seen a slow-motion video created from a regular video? Sometimes it uses optical flow to predict intermediate frames. "Between frame 1 and frame 3, based on the motion I see, frame 2 probably looked like this."&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;OpenCV documentation on optical flow: &lt;a href="https://docs.opencv.org/master/d4/dee/tutorial_optical_flow.html" rel="noopener noreferrer"&gt;https://docs.opencv.org/master/d4/dee/tutorial_optical_flow.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Research paper (accessible): "An Introduction to Image Processing" by Gonzalez &amp;amp; Woods covers optical flow basics&lt;/li&gt;
&lt;li&gt;YouTube channel &lt;a href="https://www.youtube.com/user/sentdex" rel="noopener noreferrer"&gt;Sentdex&lt;/a&gt; has OpenCV tutorials including optical flow&lt;/li&gt;
&lt;li&gt;RAFT paper (modern deep learning approach): &lt;a href="https://arxiv.org/abs/2003.12039" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2003.12039&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;The reason why optical flow matters in robotics is that it's one of the fundamental ways a robot can understand the world without relying on explicit sensors. A robot with just a camera can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Know it's moving&lt;/li&gt;
&lt;li&gt;Detect obstacles&lt;/li&gt;
&lt;li&gt;Track objects&lt;/li&gt;
&lt;li&gt;Understand its environment&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>robotics</category>
      <category>computervision</category>
      <category>nicanorkorir</category>
    </item>
    <item>
      <title>CNNs: from a beginner's point of view</title>
      <dc:creator>Nicanor Korir</dc:creator>
      <pubDate>Wed, 12 Nov 2025 18:09:12 +0000</pubDate>
      <link>https://forem.com/nicanor_korir/cnns-from-a-beginners-point-of-view-7ek</link>
      <guid>https://forem.com/nicanor_korir/cnns-from-a-beginners-point-of-view-7ek</guid>
      <description>&lt;p&gt;I've learnt this topic about 20 times now, some are a bit confusing, and of course, I know some core things. In this article, I am going to break down CNN to make it easy to understand the basics and maybe the advanced CNN.&lt;/p&gt;




&lt;p&gt;Okay, from your perspective, how do you recognize your friend's face in a crowded room?&lt;/p&gt;

&lt;p&gt;Like, genuinely, what's happening in your brain? You're not calculating pixel values or comparing feature vectors. You just &lt;em&gt;see&lt;/em&gt; them and instantly think, "Oh, that's Sarah."&lt;/p&gt;

&lt;p&gt;Your brain is doing something incredibly sophisticated without you realizing it. And that's exactly what CNNs (&lt;strong&gt;Convolutional Neural Networks&lt;/strong&gt;) are trying to do. They're trying to teach computers to see and understand images the way your brain does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the Problem We're Trying to Solve with CNNs?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before CNNs, people tried to use regular neural networks (fully connected networks) to process images. Here's how it worked: take an image, flatten it into a long list of numbers (every pixel becomes a number), and feed that into a neural network.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Sorry, this will rush things, but stay with me here&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;An image that's 224x224 pixels has about 150,000 pixels. If you have an RGB image (3 color channels), that's 450,000 numbers. If your first hidden layer has 1000 neurons, you now have 450 million weights to learn just in the first layer.&lt;/p&gt;

&lt;p&gt;That's massive. Your network becomes incredibly expensive to train, slow to run, and prone to overfitting (memorizing instead of learning).&lt;/p&gt;

&lt;p&gt;But here's the thing, and this is important: &lt;strong&gt;images have structure&lt;/strong&gt;. Pixels next to each other are related. An eye is an eye, whether it's in the top-left or bottom-right of your image. Your brain doesn't relearn what an eye looks like every time it's in a different position.&lt;/p&gt;

&lt;p&gt;So the question becomes: &lt;strong&gt;How do we build a neural network that understands this spatial structure and reuses knowledge across the image?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's where convolutions come in&lt;/p&gt;

&lt;p&gt;In mathematics, &lt;a href="https://mathworld.wolfram.com/Convolution.html" rel="noopener noreferrer"&gt;&lt;strong&gt;convolution&lt;/strong&gt;&lt;/a&gt; is an operation that combines two functions to produce a third function, showing how one modifies or overlaps with the other as it shifts across it. In &lt;strong&gt;CNNs&lt;/strong&gt;, this idea is used to slide a filter across an image to detect features such as edges and patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Core Idea&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine you're looking at a painting. Instead of analyzing every millimeter of it at once, you use a small window to look at it piece by piece. You slide that window across the painting, examining each region.&lt;/p&gt;

&lt;p&gt;You might notice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In this region, there are strong diagonal lines (could be an arm)&lt;/li&gt;
&lt;li&gt;In that region, there's a curved edge (could be a face)&lt;/li&gt;
&lt;li&gt;Over there, there's a specific color pattern (could be hair)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, imagine you're looking for specific patterns, edges, corners, colors, and shapes. As you slide your window across the image, you're asking: "Does this pattern appear here? How strongly?"&lt;/p&gt;

&lt;p&gt;That's convolution&lt;/p&gt;

&lt;p&gt;In math terms, you have:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;An image&lt;/strong&gt; (the painting)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A filter/kernel&lt;/strong&gt; (your small window, usually 3x3 or 5x5)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A convolution operation&lt;/strong&gt; (sliding the filter across the image and computing a value for each position)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The **filter **is like a feature detector, different filters detect different features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One filter might detect horizontal edges&lt;/li&gt;
&lt;li&gt;Another detects vertical edges&lt;/li&gt;
&lt;li&gt;Another detects corners&lt;/li&gt;
&lt;li&gt;Another detects specific textures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's the magic: &lt;strong&gt;the network learns what these filters should be.&lt;/strong&gt; You don't hard-code "detect an edge." The network figures out, "To recognize images well, I should learn these specific filter patterns."&lt;/p&gt;

&lt;p&gt;Let's say you have a 5x5 image (tiny, for illustration):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And a 3x3 filter (kernel):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1  0 -1
2  0 -2
1  0 -1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(This is actually a real filter, &lt;a href="https://de.wikipedia.org/wiki/Sobel-Operator" rel="noopener noreferrer"&gt;&lt;strong&gt;the Sobel filter&lt;/strong&gt;&lt;/a&gt;, that detects vertical edges)&lt;/p&gt;

&lt;p&gt;Convolution works like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Place the filter on the top-left of the image&lt;/li&gt;
&lt;li&gt;Multiply each element of the filter by the corresponding image element&lt;/li&gt;
&lt;li&gt;Sum all those products&lt;/li&gt;
&lt;li&gt;That sum is the output for that position&lt;/li&gt;
&lt;li&gt;Slide the filter one position to the right, repeat&lt;/li&gt;
&lt;li&gt;When you reach the end of a row, move down and start from the left&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After sliding through the entire image, you get a new, slightly smaller image. That new image highlights where the filter's pattern appears strongly in the original image.&lt;/p&gt;

&lt;p&gt;Do this with multiple filters, and you get numerous feature maps. Each one shows where different patterns appear in the image.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why This Is So Powerful&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because you're sliding the same filter across the image, you're using the same weights everywhere. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fewer parameters&lt;/strong&gt;: Instead of 450 million weights, maybe you have 9 (for a 3x3 filter) × number of filters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weight sharing&lt;/strong&gt;: The network learns that certain patterns are important, and it looks for them everywhere&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Translation invariance&lt;/strong&gt;: An edge detector works whether the edge is in the top-left or bottom-right&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your network becomes smaller, faster, and smarter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layering&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's where it gets interesting. You don't just do one convolution, you stack them.&lt;/p&gt;

&lt;p&gt;After the first convolution, you get feature maps that detect simple patterns (edges, corners). Then you apply another convolution to those feature maps. Now you're detecting patterns &lt;em&gt;of patterns&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Maybe the second layer detects "edges arranged in a circular pattern" (detecting circles). The third layer might detect "circles with specific textures" (detecting eyes or wheels).&lt;/p&gt;

&lt;p&gt;By the time you're 10 layers deep, you're detecting high-level features: "This looks like a face," "This looks like a car," "This looks like a dog."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrtgre6ng9e7ew70155t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrtgre6ng9e7ew70155t.png" alt=" " width="800" height="235"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the hierarchy of features:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Layer 1: Edges and corners
Layer 2: Simple shapes (circles, lines arranged together)
Layer 3: Textures and patterns
Layer 4: Parts of objects (wheels, fur, eyes)
Layer 5+: Whole objects (cars, animals, faces)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This mirrors how your brain works. You see edges first, then recognize that those edges form a nose, then recognize that a nose is part of a face.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0gfe9uxolfbgq8sgmjr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0gfe9uxolfbgq8sgmjr.png" alt=" " width="285" height="177"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pooling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Pooling_layer" rel="noopener noreferrer"&gt;Pooling &lt;/a&gt; in CNNs reduces the size of feature maps by summarizing small regions (like taking the maximum value in a 2×2 area), so the network keeps the most important information while becoming more efficient. It helps make feature detection more stable, even if an object shifts slightly in the image. The most common method is &lt;strong&gt;max pooling&lt;/strong&gt;, where you take the maximum value in that region.&lt;/p&gt;

&lt;p&gt;Why? Because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It reduces the spatial size (fewer numbers to process)&lt;/li&gt;
&lt;li&gt;It makes the network more robust to small shifts (if a feature moves slightly, max pooling will still find it)&lt;/li&gt;
&lt;li&gt;It emphasizes the strongest features (the maximum value is usually the most important)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtynwvgi3bzygfnrgxir.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtynwvgi3bzygfnrgxir.png" alt=" " width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;An Actual CNN Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let me show you what a simple CNN looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input Image (224x224x3)
    ↓
Convolution (32 filters, 3x3) → Output: 224x224x32
ReLU activation
Max Pooling (2x2) → Output: 112x112x32
    ↓
Convolution (64 filters, 3x3) → Output: 112x112x64
ReLU activation
Max Pooling (2x2) → Output: 56x56x64
    ↓
Convolution (128 filters, 3x3) → Output: 56x56x128
ReLU activation
Max Pooling (2x2) → Output: 28x28x128
    ↓
Flatten → 28*28*128 = 100,352 values
    ↓
Fully Connected Layer (256 neurons)
ReLU activation
    ↓
Fully Connected Layer (10 neurons) → Output: probabilities for 10 classes
    ↓
Softmax → Final prediction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer is making the data smaller but richer. By the end, instead of 224x224 pixels, you have 10 numbers representing "how confident am I that this is a [cat/dog/bird/etc]?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Okay, But How Do You Actually Train This?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The process is similar to regular neural networks, but the convolutions make it special:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Forward pass&lt;/strong&gt;: Image goes through the layers, producing a prediction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loss calculation&lt;/strong&gt;: Compare prediction to ground truth. "I said dog, it was actually a cat. That's wrong."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backpropagation&lt;/strong&gt;: Calculate gradients through all the layers, including the convolutional layers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update filters&lt;/strong&gt;: Adjust the filter weights so they become better at detecting useful features&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repeat&lt;/strong&gt;: Do this thousands of times until the network gets better&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The network automatically learns what filters to use. You don't tell it "detect edges." It figures it out because detecting edges helps it recognize objects better.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Real-World Applications&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Medical Imaging&lt;/strong&gt;: Detecting tumors in X-rays, CT scans&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomous Vehicles&lt;/strong&gt;: Detecting pedestrians, traffic signs, and lane markings. CNNs can process camera feeds in real-time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Social Media&lt;/strong&gt;: Instagram uses CNNs for content recommendation, Facebook for face detection, and TikTok for understanding video content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Satellite Imagery&lt;/strong&gt;: Detecting changes in landscapes, tracking deforestation, and counting crops&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality Control&lt;/strong&gt;: Manufacturing plants use CNNs to detect defects in products at superhuman speeds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;E-commerce&lt;/strong&gt;: Product recognition, visual search (take a photo of something, find similar items online)&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;The Limitations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I don't want to oversell this, CNNs have real limitations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. They Need Lots of Data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Unlike humans, who can learn from a few examples, CNNs need thousands. Transfer learning helps, but it's still data-hungry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. They're Brittle&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A CNN trained to recognize a dog might be completely fooled by a tiny, carefully crafted perturbation of the image. Humans see it as obviously still a dog.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. They Don't Understand Context&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A CNN might recognize all the objects in an image perfectly, but miss the relationship between them. It sees "cat," "couch," but doesn't understand "cat sitting on couch."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. They're Black Boxes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can visualize what they learned, but explaining &lt;em&gt;why&lt;/em&gt; a specific prediction was made is hard. This matters for medical or legal applications where you need explainability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. They're Computationally Expensive&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Running inference requires significant resources, especially for complex models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What next?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For me, I'll create a practical example. I am also doing product recognition and categorization for a warehouse using different tools and technologies. For you, you might tell me in the comments or on social media, and we can chat about&lt;/p&gt;

</description>
      <category>ai</category>
      <category>robotics</category>
      <category>machinelearning</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Robot Immitation: A gentle Intro</title>
      <dc:creator>Nicanor Korir</dc:creator>
      <pubDate>Wed, 12 Nov 2025 06:47:20 +0000</pubDate>
      <link>https://forem.com/nicanor_korir/robot-immitation-a-gentle-intro-30hc</link>
      <guid>https://forem.com/nicanor_korir/robot-immitation-a-gentle-intro-30hc</guid>
      <description>&lt;p&gt;You know that feeling when you're trying to learn something new, and the best way is just to watch someone do it first? That's kind of what robot imitation is about.&lt;/p&gt;

&lt;p&gt;Robot imitation, or "learning from demonstration," is when a robot watches a human (or another robot, or even a video) perform a task, and then tries to reproduce that same task. The robot is basically saying, "I saw what you did, now I'm going to do it too."&lt;/p&gt;

&lt;p&gt;But here's where it gets interesting, the robot isn't just recording your movements like a video playback. It's learning the &lt;em&gt;underlying pattern&lt;/em&gt; of what you're doing. It's figuring out, "Oh, I see, when the human's hand moves here, the gripper opens. When it moves there, the gripper closes. When there's resistance, the force increases."&lt;/p&gt;

&lt;p&gt;The robot is extracting the &lt;em&gt;meaning&lt;/em&gt; behind your actions, not just copying pixel-for-pixel movements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Does This Matter? Why Not Just Program Everything?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you wanted a robot to pick up a coffee mug, you could:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Option A: Program it manually&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calculate the exact coordinates where the mug is&lt;/li&gt;
&lt;li&gt;Program the exact angle to approach it&lt;/li&gt;
&lt;li&gt;Set the exact force to grip it without breaking it&lt;/li&gt;
&lt;li&gt;Account for different mug sizes, weights, and handle positions&lt;/li&gt;
&lt;li&gt;Do this for every single object the robot might encounter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This takes forever, and the moment something changes, a slightly different mug, a mug in a slightly different position, the whole thing breaks.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Option B: Show the robot how to do it&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grab a mug for yourself a few times&lt;/li&gt;
&lt;li&gt;Let the robot watch and learn&lt;/li&gt;
&lt;li&gt;The robot figures out the pattern&lt;/li&gt;
&lt;li&gt;Now it can grab mugs it's never seen before&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Option B is way more efficient, right? This is why robot imitation is becoming so important.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Basic Idea&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you show a robot how to do something, you're teaching it several layers of information. Let me break this down:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Perception: What does the robot see?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The robot needs to understand what's in front of it. This usually involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Computer vision (cameras looking at the scene)&lt;/li&gt;
&lt;li&gt;Identifying objects ("That's a mug")&lt;/li&gt;
&lt;li&gt;Understanding spatial relationships ("The mug is to the left of the plate")&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is actually one of the hardest parts. Humans do this instantly. Robots? They need to be trained to recognize what they're looking at.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The Action: What does the robot do?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once it understands the scene, what moves does it make?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hand/gripper positioning&lt;/li&gt;
&lt;li&gt;Force applied&lt;/li&gt;
&lt;li&gt;Speed of movement&lt;/li&gt;
&lt;li&gt;Timing of actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The robot records: "When I see object X at position Y, I move my arm like this, with this amount of force."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The Logic: Why does the robot do it that way?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the tricky part. The robot needs to understand not just &lt;em&gt;what&lt;/em&gt; you did, but &lt;em&gt;why&lt;/em&gt; you did it that way.&lt;/p&gt;

&lt;p&gt;For example, if you're picking up a mug:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You grab from the handle (not the hot part of the mug)&lt;/li&gt;
&lt;li&gt;You move slowly (not jerky movements)&lt;/li&gt;
&lt;li&gt;You apply enough force to hold it, but not crush it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A good imitation learning system figures out these principles and applies them to new situations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How Does the Robot Actually Learn This?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Okay, so the robot is watching you. But how does it translate what it sees into something it can do?&lt;/p&gt;

&lt;p&gt;There are a few main approaches:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach 1: Behavioural Cloning (The Simplest Way)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is basically supervised learning. Here's how it works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A human demonstrates a task multiple times (let's say, picking up different objects)&lt;/li&gt;
&lt;li&gt;The robot records: what it sees (camera input) and what the human does (hand movements, gripper position, force)&lt;/li&gt;
&lt;li&gt;This becomes training data: "When you see this image, the action is this"&lt;/li&gt;
&lt;li&gt;We train a neural network: "Learn the pattern between images and actions"&lt;/li&gt;
&lt;li&gt;Now the robot can predict: "I see this, so I should do that"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's like learning to drive by watching tons of videos of good drivers. You see what they do in different situations, and your brain learns the pattern.&lt;/p&gt;

&lt;p&gt;The limitation? The robot learns to copy &lt;em&gt;exactly&lt;/em&gt; what it saw. If something is slightly different, a different angle, a different object, it might fail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach 2: Learning the Underlying Policy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of just copying, the robot tries to learn the &lt;em&gt;rules&lt;/em&gt; of what's happening.&lt;/p&gt;

&lt;p&gt;Think of it like learning a recipe, not just watching someone cook once. You're trying to understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What's the goal?&lt;/li&gt;
&lt;li&gt;What are the important steps?&lt;/li&gt;
&lt;li&gt;What can vary, and what can't?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The robot learns to generalize. It doesn't just copy, it adapts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach 3: Inverse Reinforcement Learning (The Sneaky Approach)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This one's wild. Instead of the robot learning "do this," it learns "what is the human trying to optimize for?"&lt;/p&gt;

&lt;p&gt;Here's the idea: when a human does a task, they're implicitly optimizing for something. When you pick up a mug carefully, you're optimizing for "don't break the mug and don't spill coffee." The robot tries to figure out what you're optimizing for, then uses that as a reward signal.&lt;/p&gt;

&lt;p&gt;The robot is essentially asking: "What's the hidden objective here?"&lt;/p&gt;

&lt;p&gt;This is more advanced, but it's powerful because the robot learns the &lt;em&gt;intent&lt;/em&gt;, not just the movements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-World Example: Teaching a Robot to Cook&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let me make this concrete with something you might actually do.&lt;/p&gt;

&lt;p&gt;Imagine we want to teach a robot to make scrambled eggs. Here's how imitation learning would work:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Demonstration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A chef (or you) makes scrambled eggs in front of the robot. The robot's cameras record:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Where the ingredients are&lt;/li&gt;
&lt;li&gt;How the chef moves&lt;/li&gt;
&lt;li&gt;What the chef is looking at&lt;/li&gt;
&lt;li&gt;The timing of actions (when to stir, when to stop)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The robot also records data from sensors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Heat level of the pan&lt;/li&gt;
&lt;li&gt;How long do things cook&lt;/li&gt;
&lt;li&gt;The texture of the eggs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Feature Extraction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The system figures out what matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Okay, the chef stirred when the edges started to solidify"&lt;/li&gt;
&lt;li&gt;"The chef removed it from the heat when it looked creamy"&lt;/li&gt;
&lt;li&gt;"The chef tasted it to check doneness"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the meaningful patterns for the robot, of course&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Learning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The robot creates a model: "When I see eggs with these characteristics, I should stir. When they look like this, I should stop."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Execution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The robot makes scrambled eggs on its own. It might not be &lt;em&gt;exactly&lt;/em&gt; like the chef made it (maybe slightly different timing), but it captures the essence of what makes good scrambled eggs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Adaptation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the first batch isn't perfect, the system can learn from the mistake. "Oh, I stirred too late, next time I'll stir earlier." This is where imitation learning becomes even more powerful, it's not just one-shot learning, it improves over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Challenges: Why Robot Imitation Is Still Hard&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'm not going to pretend this is easy. Some of the problems include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1: The Distribution Shift&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The robot learns from demonstrations, but the real world is messier. What if the mug is in a slightly different position? What if the lighting is different? What if the object is a different size?&lt;/p&gt;

&lt;p&gt;When the robot encounters something &lt;em&gt;different&lt;/em&gt; from what it trained on, it often fails. This is called "distribution shift", the robot is good at things that look like the training data, but bad at things that don't.&lt;/p&gt;

&lt;p&gt;This is a huge research problem right now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2: The Human-Robot Gap&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Humans have bodies that are very different from robot bodies. Humans have 206 bones and are incredibly skilled at working together. Most robots have maybe 6-7 degrees of freedom (ways to move).&lt;/p&gt;

&lt;p&gt;When a human shows you how to pick something up, they're using their whole body, balance, finger flexibility, and tactile feedback. Translating that to a robot is non-trivial, this is the biggest challenge right now, although there are a few breakthroughs&lt;/p&gt;

&lt;p&gt;One way researchers handle this is that they map human movements to robot movements. "When the human's hand moves like this, the robot's gripper moves like this." But it's imperfect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3: The Reward Problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;How do you know if the robot did the task "correctly"? For some tasks, it's obvious (did the egg get cooked?). For others, it's fuzzy (did you fold the laundry neatly enough?).&lt;/p&gt;

&lt;p&gt;Defining what success looks like is harder than it sounds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4: Data Quality&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Garbage in, garbage out is the norm in robotics. If your demonstrations are bad, your robot will learn bad behavior. If you show the robot ten different ways to do something without explaining why you did it differently, it gets confused.&lt;/p&gt;

&lt;p&gt;Getting good demonstration data is actually a real bottleneck in robot imitation learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where Is Robot Imitation Being Used Right Now?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Industrial Robots&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Companies are using imitation learning to train robots for assembly tasks. Instead of programming every detail, they show the robot the task, and it learns. This dramatically cuts down setup time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Robotic Manipulation (Grasping and Picking)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There's active research on robots that can pick objects they've never seen before by learning from human demonstrations. This is used in warehouses and manufacturing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Robotic Surgery&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Surgeons perform procedures, and the system records their movements. This data helps train surgical robots to assist or even automate certain tasks. Obviously, this requires extreme precision and validation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Autonomous Vehicles&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Self-driving cars learn by watching human drivers. The car observes: "In this situation, the human turned the wheel like this, at this speed." Over millions of miles of data, the car learns driving patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Robot Learning from Videos&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Researchers are now training robots using YouTube videos and internet-scale data. The robot is learning from millions of human demonstrations. This is cutting-edge stuff, but it's happening.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Resources to Learn More&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to dive deeper (and you should):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Papers&lt;/strong&gt;: &lt;a href="https://arxiv.org/abs/2401.08381" rel="noopener noreferrer"&gt;Robot Immitation from Human Action&lt;/a&gt;, &lt;a href="https://www.researchgate.net/publication/385858967_Imitation_Learning_for_Robotics_Progress_Challenges_and_Applications_in_Manipulation_and_Teleoperation" rel="noopener noreferrer"&gt;Imitation Learning for Robotics: Progress, Challenges, and Applications in Manipulation and Teleoperation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Books&lt;/strong&gt;: "Robotics, Vision and Control" by Peter Corke, &lt;strong&gt;Imitation Learning for Robots: Building a Strong Foundation&lt;/strong&gt; by Von Jacob&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>robotics</category>
      <category>cv</category>
      <category>ai</category>
      <category>nicanorkorir</category>
    </item>
    <item>
      <title>Getting started with Robotics</title>
      <dc:creator>Nicanor Korir</dc:creator>
      <pubDate>Tue, 11 Nov 2025 06:24:31 +0000</pubDate>
      <link>https://forem.com/nicanor_korir/getting-started-with-robotics-3f0g</link>
      <guid>https://forem.com/nicanor_korir/getting-started-with-robotics-3f0g</guid>
      <description>&lt;p&gt;You've probably seen a robot doing something cool. Maybe it's one of those automatic vacuum cleaners that somehow knows when you've spilt coffee on the floor and, boom, five minutes later, it's sparkling clean. Or maybe you've seen those robodogs on the internet and thought, "That's insane, I want to control that." Well, guess what? You can.&lt;/p&gt;

&lt;p&gt;Here's the truth, though: robotics looks intimidating from the outside, but it's actually both hard and easy at the same time. Hard if you try to understand &lt;em&gt;everything&lt;/em&gt; at once. Easy if you focus on individual pieces and build from there.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtyku1bguso3ahfptyqc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtyku1bguso3ahfptyqc.png" alt=" " width="800" height="645"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction to Robotics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is a Robot?
&lt;/h3&gt;

&lt;p&gt;A robot is basically a machine that can sense its &lt;strong&gt;environment&lt;/strong&gt;, make &lt;strong&gt;decisions&lt;/strong&gt;, and take &lt;strong&gt;action&lt;/strong&gt;. That's it. Sounds simple, right? But the magic happens in how you combine these three things.&lt;/p&gt;

&lt;p&gt;Think about that vacuum cleaner again. It has sensors (cameras, bump sensors) that tell it "hey, there's dirt here" or "I hit a wall." It has a controller (the robot's brain) that processes this information and decides what to do. And it has actuators (motors) that actually do the work, spinning brushes, and moving wheels. All three working together make one smart robot.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Robots Impact the World Today
&lt;/h3&gt;

&lt;p&gt;Honestly? Robots are everywhere now. In factories, they're assembling everything from cars to phones. In hospitals, they're assisting with surgery. Amazon warehouses are packed with robots moving packages. And in your home, you've got that vacuum, maybe a robot lawn mower, perhaps a smart speaker that's technically a robot too.&lt;/p&gt;

&lt;p&gt;The impact isn't just about replacing human jobs (though that conversation is real). It's also about doing things humans can't do efficiently, repetitive tasks, dangerous environments, and precision work at scale. Robots are the reason manufacturing is faster, safer surgeries happen, and, honestly, why you can get packages delivered so quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Different Branches of Robotics
&lt;/h3&gt;

&lt;p&gt;Robotics isn't one thing. It's actually several fields working together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Industrial Robotics&lt;/strong&gt;: Factory robots, assembly lines, manufacturing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile Robotics&lt;/strong&gt;: Robots that move around (vacuum cleaners, delivery robots, drones)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manipulation&lt;/strong&gt;: Robotic arms and hands (think surgical robots or factory arms)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Humanoids&lt;/strong&gt;: Robots that look and act like humans (still mostly experimental)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomous Vehicles&lt;/strong&gt;: Self-driving cars and similar tech&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Swarm Robotics&lt;/strong&gt;: Multiple robots coordinating together&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't need to master all of these. Pick one that excites you and start there, anyway, the basics are the same in all of them&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Robotics Concepts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Sensors &amp;amp; Actuators: How Robots Sense and Move
&lt;/h3&gt;

&lt;p&gt;Every robot needs to know what's happening around it. That's where **sensors **come in. They're like the robot's eyes, ears, and skin.&lt;/p&gt;

&lt;p&gt;Common sensors include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cameras&lt;/strong&gt;: Let the robot "see"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiDAR&lt;/strong&gt;: Measures distance using light (great for navigation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IMU (Inertial Measurement Unit)&lt;/strong&gt;: Detects motion and orientation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ultrasonic Sensors&lt;/strong&gt;: Measure distance using sound&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bump Sensors&lt;/strong&gt;: Simple "did I hit something?" switches&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the robot knows what's happening, it needs to &lt;em&gt;do&lt;/em&gt; something about it, and voila, &lt;strong&gt;actuators&lt;/strong&gt;. These are the motors and mechanisms that make the robot move or manipulate things.&lt;/p&gt;

&lt;p&gt;Types of actuators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DC Motors&lt;/strong&gt;: Simple, common, good for wheels&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Servo Motors&lt;/strong&gt;: Precise positioning, great for robotic arms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stepper Motors&lt;/strong&gt;: Very precise, often used in 3D printers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linear Actuators&lt;/strong&gt;: Push or pull in a straight line&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Sensors tell the robot what's happening, actuators make it happen.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Controllers: The Robot's Brain
&lt;/h3&gt;

&lt;p&gt;The controller is the &lt;strong&gt;decision-maker&lt;/strong&gt;. It's the microcontroller or computer that reads sensor data and decides what the actuators should do.&lt;/p&gt;

&lt;p&gt;Common controllers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Arduino&lt;/strong&gt;: Great for beginners, affordable, tons of tutorials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Raspberry Pi&lt;/strong&gt;: More powerful, can run full operating systems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time Controllers&lt;/strong&gt;: For complex industrial robots&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specialized Chips&lt;/strong&gt;: NVIDIA Jetson for AI-heavy tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've been playing around with Arduino, and I'll create a few articles on Arduino and Raspberry Pi in the future&lt;/p&gt;

&lt;h3&gt;
  
  
  Power Systems
&lt;/h3&gt;

&lt;p&gt;Robots need power, without power, nothing much will happen&lt;/p&gt;

&lt;p&gt;A few basics(always come with the manufacturer's instructions, depending on your kit):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Voltage and Current&lt;/strong&gt;: Don't mix them up. Voltage is pressure, current is flow. Too much of either can fry your robot or hurt you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batteries&lt;/strong&gt;: Usually 9V, 12V, or LiPo batteries. Match the voltage to your robot's needs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fuses&lt;/strong&gt;: These are your safety net. They blow if something goes wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heat Dissipation&lt;/strong&gt;: Motors and controllers generate heat. Ventilation matters.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Hardware vs. Software in Robotics
&lt;/h3&gt;

&lt;p&gt;Here's the thing: robotics is 50/50 hardware and software. You can have amazing code, but if your motors are wired wrong, nothing happens. You can have perfect hardware, but without good control software, your robot is just a paperweight.&lt;/p&gt;

&lt;p&gt;Both matter equally. This is why hands-on learning is crucial, you can't just read about robotics, you have to build.&lt;/p&gt;

&lt;h2&gt;
  
  
  Robotics Skills for Beginners
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Programming Basics
&lt;/h3&gt;

&lt;p&gt;You'll need to code and Python is the best language to start with, it's readable, forgiving, and widely used in robotics.&lt;/p&gt;

&lt;p&gt;You don't need to be a master programmer. Basic concepts are enough:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Variables and data types&lt;/li&gt;
&lt;li&gt;Loops and conditionals&lt;/li&gt;
&lt;li&gt;Functions&lt;/li&gt;
&lt;li&gt;Working with libraries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Spend a week or two learning Python fundamentals. Then move to robotics-specific libraries. Below, I'll introduce the Robotics software to give you the mojo to kickstart robotic life.&lt;/p&gt;

&lt;h3&gt;
  
  
  Electronics Fundamentals (Voltage, Current, Motors)
&lt;/h3&gt;

&lt;p&gt;You don't need a degree in electrical engineering. Just understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Voltage&lt;/strong&gt;: Think of it as electrical pressure (measured in volts)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Current&lt;/strong&gt;: How much electricity flows (measured in amps)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resistance&lt;/strong&gt;: Opposition to flow (measured in ohms)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ohm's Law&lt;/strong&gt;: V = I × R (this is important)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Practical skills:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read a circuit diagram&lt;/li&gt;
&lt;li&gt;Use a multimeter to check voltage&lt;/li&gt;
&lt;li&gt;Solder wires together&lt;/li&gt;
&lt;li&gt;Connect motors to controllers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;YouTube has tons of beginner electronics tutorials. Watch a few before you touch anything.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic Mechanics (Gears, Joints, Motion)
&lt;/h3&gt;

&lt;p&gt;Physics matters, understanding how gears work, how joints move, and how force transfers makes you a better roboticist.&lt;/p&gt;

&lt;p&gt;Basic concepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gears&lt;/strong&gt;: Transfer power and change speed/torque&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Joints&lt;/strong&gt;: Allow movement in specific directions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Friction&lt;/strong&gt;: Affects movement and efficiency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Torque&lt;/strong&gt;: Rotational force (important for motors)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Introduction to AI in Robotics
&lt;/h2&gt;

&lt;p&gt;AI is becoming central to robotics. Your robot needs to make decisions based on sensor input. That's where AI comes in.&lt;/p&gt;

&lt;p&gt;For beginners:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start with simple logic (if this, then that)&lt;/li&gt;
&lt;li&gt;Move to basic machine learning (object detection with pre-trained models)&lt;/li&gt;
&lt;li&gt;Eventually explore reinforcement learning (robot learning through trial and error)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't need to implement cutting-edge AI. Use existing libraries like TensorFlow or PyTorch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simulation &amp;amp; Virtual Robotics
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why Use Simulators?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Simulations are cheap, fast, and forgiving. You can crash a simulated robot a thousand times without spending a dime. You can test algorithms in minutes instead of hours. And you can focus on the software without worrying about hardware limitations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Popular Simulators:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gazebo&lt;/strong&gt;: Open-source, free, industry-standard for ROS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Webots&lt;/strong&gt;: Beginner-friendly, good documentation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyBullet&lt;/strong&gt;: Physics engine, great for reinforcement learning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NVIDIA Isaac Sim&lt;/strong&gt;: Cutting-edge, free, powerful&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unity ML-Agents&lt;/strong&gt;: Game engine + AI training&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MujoCo&lt;/strong&gt;: Physics-based, research-oriented&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CoppeliaSim&lt;/strong&gt;: Versatile, good for learning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For beginners, I'd recommend &lt;strong&gt;PyBullet&lt;/strong&gt;, &lt;strong&gt;Webots&lt;/strong&gt;, or &lt;strong&gt;Gazebo + ROS&lt;/strong&gt;. They have gentle learning curves and tons of tutorials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build Your First Virtual Robot&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pick a simulator and follow its beginner tutorial. I don't want to add a complete guide here, let me link to an article on how to create your first virtual robot, step by step, later. You'll learn the basic workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a robot model&lt;/li&gt;
&lt;li&gt;Add sensors and actuators&lt;/li&gt;
&lt;li&gt;Write control code&lt;/li&gt;
&lt;li&gt;Run and observe&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It'll feel like the real thing, but without the crashes. I've done these a ton of times with different platforms, they are always fun to play with.&lt;/p&gt;

&lt;h3&gt;
  
  
  Robotics Software
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What is ROS/ROS2?
&lt;/h4&gt;

&lt;p&gt;ROS (Robot Operating System) is like the "operating system" for robots. It's a framework that makes it easier to write robot software.&lt;/p&gt;

&lt;p&gt;ROS handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Communication between different robot components&lt;/li&gt;
&lt;li&gt;Managing sensors and actuators&lt;/li&gt;
&lt;li&gt;Running multiple programs simultaneously&lt;/li&gt;
&lt;li&gt;Lots of pre-built tools and libraries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Is ROS necessary? Not for your first robot. But it's industry-standard, and learning it early pays off.&lt;/p&gt;

&lt;p&gt;ROS2 is the newer version, cleaner, and more modern. If you're starting fresh, go with ROS2.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Working with URDF Models (Robot Representation)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;URDF (Unified Robot Description Format) is basically XML that describes your robot. It tells the system: "Here's my robot, it has these joints, these links, these sensors."&lt;/p&gt;

&lt;p&gt;You write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;robot&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"my_robot"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"body"&lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"wheel_left"&lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;joint&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"left_wheel_joint"&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"revolute"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="c"&gt;&amp;lt;!-- joint details --&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/joint&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/robot&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This might look intimidating, but it's just describing geometry and connections. Tools can visualize it for you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Robot Control Program&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A simple example in Python with ROS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;rospy&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sensor_msgs.msg&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LaserScan&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;geometry_msgs.msg&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Twist&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# msg contains laser scan data
&lt;/span&gt;    &lt;span class="c1"&gt;# If something is close, stop; otherwise, move forward
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ranges&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;move_forward&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;rospy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;obstacle_avoider&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rospy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Subscriber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/scan&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LaserScan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rospy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;spin&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This listens to a laser scanner and avoids obstacles. Simple, right?&lt;/p&gt;

&lt;h2&gt;
  
  
  Robotics in the Real World
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Robotics in Healthcare, Space, Manufacturing, Entertainment&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Healthcare&lt;/strong&gt;: Surgical robots (Da Vinci), rehabilitation robots, delivery bots in hospitals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Space&lt;/strong&gt;: Mars rovers, satellite deployment robots, exploration drones&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manufacturing&lt;/strong&gt;: Assembly lines, welding robots, material handling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entertainment&lt;/strong&gt;: Robodogs, humanoid entertainers, theme park attractions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each domain has unique challenges. Healthcare robots need to be incredibly precise and safe. Space robots need to operate autonomously with limited communication. Manufacturing robots need to work 24/7 without breaking down.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ethical and Safety Considerations
&lt;/h2&gt;

&lt;p&gt;As robots become more powerful and autonomous, we need to think about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Safety&lt;/strong&gt;: What happens if a robot malfunctions?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bias&lt;/strong&gt;: If a robot uses AI, does that AI have biases?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomy&lt;/strong&gt;: How much decision-making should we give to robots?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Displacement&lt;/strong&gt;: What about workers whose jobs are replaced?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Resources&lt;/strong&gt;&lt;br&gt;
I am not going to recommend any courses right now, since I am using various tools, lectures, and books, maybe in the future. There are tons of materials out there if you need.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Kits:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LEGO Mindstorms: Great for learning&lt;/li&gt;
&lt;li&gt;Arduino Starter Kits: Affordable, beginner-friendly&lt;/li&gt;
&lt;li&gt;ROSbot: Pre-built mobile robot, good for ROS learning&lt;/li&gt;
&lt;li&gt;Donkey Car: Open-source autonomous car project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Communities:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ROS Discourse: Official ROS community&lt;/li&gt;
&lt;li&gt;Reddit: r/robotics is helpful&lt;/li&gt;
&lt;li&gt;GitHub: Browse robotics projects, contribute&lt;/li&gt;
&lt;li&gt;Local meetups: Find robotics groups in your city&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Competitions and Open-Source Projects&lt;/strong&gt;&lt;br&gt;
Some of these are really interesting to follow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RoboCup&lt;/strong&gt;: International robotics competition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FIRST Robotics&lt;/strong&gt;: High school competition (they have adult divisions too)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sparkfun AVC&lt;/strong&gt;: Autonomous vehicle competition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-source&lt;/strong&gt;: Contribute to projects like Donkey Car, OpenDog, etc.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Robotics is wide, and we can't finish learning in one article. But this guide should get you started. Pick a project, grab some components, and build something. That's how you really learn.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>robotics</category>
      <category>programming</category>
    </item>
    <item>
      <title>Learning AI in the world of fast-moving AI</title>
      <dc:creator>Nicanor Korir</dc:creator>
      <pubDate>Mon, 10 Nov 2025 04:35:57 +0000</pubDate>
      <link>https://forem.com/nicanor_korir/learning-ai-in-the-world-of-fast-moving-ai-2gpm</link>
      <guid>https://forem.com/nicanor_korir/learning-ai-in-the-world-of-fast-moving-ai-2gpm</guid>
      <description>&lt;p&gt;This is gonna be a tough one. I mean, writing without using ChatGPT, Claude, etc, to generate this blog post for me. I'll create an article on how I am using AI in my studies and work&lt;/p&gt;

&lt;p&gt;I'm currently pursuing a Master's in Artificial Intelligence with a Robotics specialisation, which I genuinely love, but let me tell you: learning in the age of fast-moving AI is both exhilarating and disorienting at the same time.&lt;/p&gt;

&lt;h4&gt;
  
  
  What does "fast-moving AI" even mean?
&lt;/h4&gt;

&lt;p&gt;Now?&lt;br&gt;
We have tools&lt;br&gt;
We have automation&lt;br&gt;
We have AI helping us build AI&lt;/p&gt;

&lt;p&gt;Here's the thing, AI has been practically real for about the last two years now. And it's getting crazier every single day. Right now, AI isn't some distant future thing anymore, it's embedded in almost every industry. Discoveries drop constantly. Everything is accelerating. It's like trying to read a book while someone keeps flipping the pages faster and faster.&lt;/p&gt;

&lt;p&gt;For me I am studying both the present and the past simultaneously, and they're moving in different directions. The past stuff is the foundational theory, which is crucial because it's what gives you actual understanding. You need to know why things work, not just that they work. But the current stuff? That's where it gets wild. You're learning cutting-edge applications while also reverse-engineering them: What's the origin? How did we get here? What does this mean for what comes next?&lt;/p&gt;

&lt;h4&gt;
  
  
  The generational shift in AI education
&lt;/h4&gt;

&lt;p&gt;If I compare my experience to someone who did a Master's in AI ten years ago, the difference is huge. Back then, you had to build almost everything from scratch. You did the research, prepared your own datasets, and wrote your own implementations. Speed wasn't even part of the equation; research took time.&lt;/p&gt;

&lt;p&gt;Now? We have foundational models, pre-trained systems, automated pipelines, and off-the-shelf tools that would've taken years to develop a decade ago. But here's the double-edged sword: we still need to understand how to build these things manually, so I still do the old model stuff, but using the current baked tools, fancy for me. The difference is that we also have the option to leverage automation. So the game has shifted, it's not about reinventing the wheel anymore; it's about knowing when to build wheels, when to use existing ones, and how to integrate them intelligently.&lt;/p&gt;

&lt;p&gt;I'm planning to write deeper dives into exactly what I'm learning and how it's different from the "old way" &lt;/p&gt;

&lt;h3&gt;
  
  
  Robotics is its own beast
&lt;/h3&gt;

&lt;p&gt;Now, robotics is a different animal entirely. Unlike NLP, where pre-trained models and massive datasets have democratised the field, robotics hasn't had that same tooling revolution. You can't just download a pre-trained robot. You have to do the practicals, actually build, test, and iterate. With lots of practicals, robotics isn't just one field, it contains multiple fields working together. I'll link to a Getting Started with Robotics article here.&lt;/p&gt;

&lt;p&gt;The foundation matters even more in robotics because you're combining computer vision, control systems, mechanics, and AI reasoning all at once.  The good thing right now is that there are excellent simulation environments that help to understand the practical part.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keeping up without burning out
&lt;/h3&gt;

&lt;p&gt;There's this underlying anxiety in AI right now: if you stop learning for a week, you've missed something important. But here's what I've learned: keeping up is actually manageable if, and this is crucial, you have a strong foundation. Every breakthrough, every new model, every new technique. They're all built on the same fundamentals of computer science, mathematics, and physics. The basics haven't changed. The applications have exploded, but the principles are solid&lt;/p&gt;

&lt;p&gt;These are just high-level thoughts on how I personally learn AI while immersed in the AI spree. An article on how I am using AI for learning and work would be awesome to break down into low-level details&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>mlhgrad</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
