<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sam Hartley</title>
    <description>The latest articles on Forem by Sam Hartley (@samhartley_dev).</description>
    <link>https://forem.com/samhartley_dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3811539%2Fdd554e30-699d-42a3-a82a-77673790a186.png</url>
      <title>Forem: Sam Hartley</title>
      <link>https://forem.com/samhartley_dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/samhartley_dev"/>
    <language>en</language>
    <item>
      <title>3 Months Running Everything Locally — What Broke, What Worked, What I'd Do Differently</title>
      <dc:creator>Sam Hartley</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:03:48 +0000</pubDate>
      <link>https://forem.com/samhartley_dev/3-months-running-everything-locally-what-broke-what-worked-what-id-do-differently-3e1b</link>
      <guid>https://forem.com/samhartley_dev/3-months-running-everything-locally-what-broke-what-worked-what-id-do-differently-3e1b</guid>
      <description>&lt;p&gt;It's been about three months since I made the switch. No more ChatGPT Plus. No more Claude subscription. No more Copilot. Everything I use day-to-day now runs on hardware sitting in my apartment — a Mac mini M4 and a PC with a mix of consumer GPUs.&lt;/p&gt;

&lt;p&gt;I wrote the enthusiastic "I ditched OpenAI" post back in early March. This is the honest follow-up, because a lot of people asked me to come back after the honeymoon phase.&lt;/p&gt;

&lt;p&gt;Some of it worked better than I expected. Some of it was genuinely annoying. Here's the unfiltered version.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup (short version)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mac mini M4&lt;/strong&gt;, 16 GB RAM — runs the orchestrator, small models, all the glue code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PC with an RTX 3060 12GB&lt;/strong&gt; (plus spare 3070s and 3080s in a drawer) — runs the heavier models via Ollama&lt;/li&gt;
&lt;li&gt;Everything talks over my LAN. Nothing leaves the house unless I tell it to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I use this stuff for coding help, writing drafts, summarizing articles, transcribing voice notes, and a handful of personal automations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually worked
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Coding help for "normal" tasks
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;qwen3-coder:30b&lt;/code&gt; on the 3060 handles maybe 80% of what I used to ask GPT-4 for. Refactoring a function, explaining a gnarly regex, writing a quick shell script, sketching a React component. It's fast enough that I don't miss the cloud.&lt;/p&gt;

&lt;p&gt;The latency is actually &lt;em&gt;better&lt;/em&gt; than cloud APIs because there's no round trip to the US. I'll type a prompt and have tokens streaming back in under a second.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Voice notes and transcription
&lt;/h3&gt;

&lt;p&gt;I didn't expect this to be the killer app, but it is. I talk to my Mac mini while I'm cooking, and Whisper on-device dumps the text into a daily markdown file. Zero cost, zero privacy worries, zero "oh I forgot to renew my API key."&lt;/p&gt;

&lt;p&gt;This alone probably saved me more time than the coding stuff.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Batch jobs that used to rack up API bills
&lt;/h3&gt;

&lt;p&gt;Summarizing 200 PDFs. Tagging a folder of screenshots. Generating alt text for a blog's image archive. These were the things that made me nervous to click "run" on OpenAI. Now I just let them chew overnight on the PC. Electricity is cheaper than tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  What broke or annoyed me
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The "it's almost right" gap on hard stuff
&lt;/h3&gt;

&lt;p&gt;For anything genuinely hard — like debugging a weird async bug in a codebase the model hasn't seen, or reasoning about a tricky algorithm — the gap between local 30B models and frontier cloud models is still real. It's not huge, but it's there.&lt;/p&gt;

&lt;p&gt;I caught myself a few times thinking "I bet GPT-5 would get this in one shot" while I was on my fifth prompt with a local model. That's the honest truth.&lt;/p&gt;

&lt;p&gt;My workaround: I keep a very small pay-as-you-go budget for the 2-3 times a month I actually need a frontier model. Probably $5/month total. Way cheaper than any subscription.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Context windows
&lt;/h3&gt;

&lt;p&gt;Local models with huge contexts exist, but they get &lt;em&gt;slow&lt;/em&gt;. Pasting a 50-file codebase into an 8B model and waiting is painful. I ended up writing a little script that does smart file selection instead of just dumping everything — basically a poor man's RAG. Worked better than I expected, but it was a weekend of yak-shaving I wasn't planning on.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Model sprawl
&lt;/h3&gt;

&lt;p&gt;I have like 14 models pulled right now. &lt;code&gt;qwen3-coder&lt;/code&gt; for code. &lt;code&gt;deepseek-r1&lt;/code&gt; for reasoning. A vision model for screenshots. Whisper for audio. A small embedding model. A translator. Each one made sense at the time. Now my Ollama directory is 180 GB and I can't remember what half of them are for.&lt;/p&gt;

&lt;p&gt;I need to do a spring cleaning. I keep putting it off.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The "is it plugged in" problem
&lt;/h3&gt;

&lt;p&gt;My PC is in another room. Sometimes my wife moves stuff and unplugs the switch. Sometimes Windows decides to reboot for updates at 3am. Sometimes the LAN cable gets bumped.&lt;/p&gt;

&lt;p&gt;Cloud APIs just... work. Local stuff requires you to be your own SRE. I have health checks now. I have a Telegram bot that pings me when Ollama stops responding. This is not the kind of "home lab tinkering" I signed up for, but here we are.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently if I started today
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Skip the "run everything locally or nothing" purity thing.&lt;/strong&gt; I wasted a few weeks trying to make local models do things they're just not good at yet. The sweet spot is a hybrid setup: local for the 90% of boring stuff, a small cloud budget for the 10% that genuinely needs a bigger brain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Buy one strong GPU instead of three mediocre ones.&lt;/strong&gt; I have a drawer full of 3070s and 3080s I thought I'd use for "multi-GPU inference." In practice, a single 12GB card running a good 30B model handles almost everything I need, and juggling multiple cards adds complexity I don't want.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Put the models behind one API, not five.&lt;/strong&gt; Ollama, llama.cpp, a Whisper server, a vision endpoint, embeddings... I should have stood up a single gateway that routes to the right backend. Instead I have five different base URLs in five different config files. It's fine, but it's ugly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write down what each model is for.&lt;/strong&gt; Past-me assumed future-me would remember. Future-me does not remember.&lt;/p&gt;

&lt;h2&gt;
  
  
  Would I go back?
&lt;/h2&gt;

&lt;p&gt;No. But I'd stop telling people "local AI is ready, ditch the subscriptions" like it's some binary choice. It's not.&lt;/p&gt;

&lt;p&gt;If you're a developer who likes tinkering, has a decent GPU already, and wants to cut a $20-40/month subscription — yeah, it's great. You'll learn a lot, you'll own your stack, and you won't feel weird about pasting private code into someone else's API.&lt;/p&gt;

&lt;p&gt;If you just want the best possible model for your work and don't want to babysit anything — stay on the cloud. There's no shame in it.&lt;/p&gt;

&lt;p&gt;I'm in the first camp, but I was wrong to pretend the second camp was being lazy. They were being reasonable.&lt;/p&gt;




&lt;p&gt;Anyone else running a mixed setup like this? I'm curious what other people's "local for X, cloud for Y" split looks like. Drop it in the comments — I'm especially interested in how you're handling the context-window problem without building your own RAG from scratch.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>selfhosted</category>
      <category>homelab</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I Let AI Coding Agents Build My Side Projects for a Month — Here's My Honest Take</title>
      <dc:creator>Sam Hartley</dc:creator>
      <pubDate>Sun, 05 Apr 2026 08:04:00 +0000</pubDate>
      <link>https://forem.com/samhartley_dev/i-let-ai-coding-agents-build-my-side-projects-for-a-month-heres-my-honest-take-52l3</link>
      <guid>https://forem.com/samhartley_dev/i-let-ai-coding-agents-build-my-side-projects-for-a-month-heres-my-honest-take-52l3</guid>
      <description>&lt;p&gt;Last month I ran an experiment: instead of writing code myself, I delegated as much as possible to AI coding agents. Not just autocomplete — full autonomous agents that read files, run commands, and ship features.&lt;/p&gt;

&lt;p&gt;I've been running a home lab (Mac Mini M4 + a Windows PC with GPUs + an Ubuntu box) for a while now, and I already had &lt;a href="https://dev.to/samhartley/how-i-automated-my-entire-dev-workflow-with-ai-agents-running-247-on-a-mac-mini-1gdi"&gt;my dev workflow automated with AI agents&lt;/a&gt;. But this time I pushed further: what if the agents didn't just &lt;em&gt;help&lt;/em&gt; me code, but actually &lt;em&gt;wrote&lt;/em&gt; the code?&lt;/p&gt;

&lt;p&gt;Here's what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I used a mix of tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; (CLI) — my go-to for complex, multi-file tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex&lt;/strong&gt; (OpenAI) — good for one-shot generation with clear specs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local models via Ollama&lt;/strong&gt; — for quick iterations without burning API credits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The workflow: I'd describe what I wanted in plain English, point the agent at the right directory, and let it work. Sometimes I'd review the output. Sometimes I'd just run the tests and ship it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Worked
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Boilerplate and scaffolding — 10/10
&lt;/h3&gt;

&lt;p&gt;This is where agents shine. "Set up a FastAPI project with SQLite, async endpoints, and Pydantic models for a subscriber management system." Done in 90 seconds. Would've taken me 20 minutes of copy-pasting from old projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Refactoring — 8/10
&lt;/h3&gt;

&lt;p&gt;"Refactor this 400-line script into separate modules with proper error handling." The agents were surprisingly good at this. They understood the intent, split things logically, and even added type hints I'd been too lazy to write.&lt;/p&gt;

&lt;p&gt;The two points I'm docking: they sometimes over-abstract. I'd ask for "cleaner code" and get an enterprise-grade factory pattern for a 50-line script. You still need taste.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Writing tests — 9/10
&lt;/h3&gt;

&lt;p&gt;This was the biggest win I didn't expect. I hate writing tests. The agents &lt;em&gt;love&lt;/em&gt; writing tests. "Write pytest tests for this module, cover edge cases" → comprehensive test suite in under a minute.&lt;/p&gt;

&lt;p&gt;The catch: you need to actually &lt;em&gt;read&lt;/em&gt; the tests. I caught a few that were testing the wrong thing — they'd pass, but they weren't testing what mattered. Still, it's a massive time saver.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Debugging — 7/10
&lt;/h3&gt;

&lt;p&gt;Hit or miss. For straightforward bugs ("this function returns None when it should return a list"), agents crush it. For subtle timing issues or race conditions? They'd suggest fixes that looked right but didn't address the root cause.&lt;/p&gt;

&lt;p&gt;My rule now: agents for the first pass, then I debug the debugger's output.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Documentation — 9/10
&lt;/h3&gt;

&lt;p&gt;README files, docstrings, API docs. Agents are better at this than I am, honestly. They're more thorough, more consistent, and they don't get lazy halfway through.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Didn't Work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Complex architecture decisions
&lt;/h3&gt;

&lt;p&gt;"Should I use Redis or just in-memory caching for this?" The agent will give you a perfectly reasonable answer either way. That's the problem — it doesn't &lt;em&gt;know&lt;/em&gt; your constraints the way you do. How many users? What's your memory budget? Are you running on a Raspberry Pi or a 128GB server?&lt;/p&gt;

&lt;p&gt;I stopped asking agents for architecture advice. I make the decision, then let them implement it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-service orchestration
&lt;/h3&gt;

&lt;p&gt;When I needed to coordinate between my Telegram bot, a background worker, and a database — and they all needed to agree on a shared state model — the agent would nail each piece individually but miss the integration points. I'd end up with three perfectly written services that didn't quite talk to each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  Anything involving my specific hardware
&lt;/h3&gt;

&lt;p&gt;"Configure this for my RTX 3060 with 12GB VRAM running on Windows with WSL2." The agents would give me generic CUDA setup instructions instead of what actually works on my rig. Local knowledge is still human knowledge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Over the month, across 4 side projects:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before (manual)&lt;/th&gt;
&lt;th&gt;After (agent-assisted)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Time to MVP&lt;/td&gt;
&lt;td&gt;~2 weeks&lt;/td&gt;
&lt;td&gt;~4 days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lines written by me&lt;/td&gt;
&lt;td&gt;~80%&lt;/td&gt;
&lt;td&gt;~30%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bugs in first deploy&lt;/td&gt;
&lt;td&gt;~8-12&lt;/td&gt;
&lt;td&gt;~5-8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time spent reviewing agent code&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;~2h/day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Overall velocity&lt;/td&gt;
&lt;td&gt;1x&lt;/td&gt;
&lt;td&gt;~2.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 2.5x multiplier is real but misleading. I'm faster at &lt;em&gt;producing code&lt;/em&gt;, but I spend more time &lt;em&gt;reviewing&lt;/em&gt; code. The net gain is still significant — maybe 1.8x when you account for review time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Changed About My Workflow
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. I write specs, not code.&lt;/strong&gt;&lt;br&gt;
My job shifted from "programmer" to "technical product manager." I write clear descriptions of what I want, define the interfaces, and let the agent fill in the implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. I review harder.&lt;/strong&gt;&lt;br&gt;
When I wrote the code myself, I'd eyeball it and move on. When an agent writes it, I actually read every line. Paradoxically, this has made me a better reviewer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. I prototype faster, throw away more.&lt;/strong&gt;&lt;br&gt;
Since generating a prototype takes minutes instead of hours, I build 2-3 approaches and pick the best one. This was a luxury I couldn't afford before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. I still write the hard parts.&lt;/strong&gt;&lt;br&gt;
State machines, complex business logic, performance-critical paths — I write these myself. Not because the agents can't, but because I need to &lt;em&gt;understand&lt;/em&gt; them deeply to maintain them later.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Truth
&lt;/h2&gt;

&lt;p&gt;Here's what nobody in the "AI will replace developers" discourse talks about: &lt;strong&gt;you need to be a good developer to use AI coding agents well.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every time the agent produced something subtly wrong, I caught it because I knew what right looked like. Every time it over-engineered a solution, I could simplify it because I understood the problem. Every time it picked the wrong library or pattern, I could redirect it because I had opinions forged by years of mistakes.&lt;/p&gt;

&lt;p&gt;Junior devs using these tools will ship faster. They'll also ship more bugs, more over-engineering, and more "it works but nobody can maintain it" code. The tools amplify whatever skill level you bring.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Recommendation
&lt;/h2&gt;

&lt;p&gt;If you're not using AI coding agents yet, start with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tests first.&lt;/strong&gt; Let agents write your test suites. Low risk, high reward.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boilerplate second.&lt;/strong&gt; Project setup, CRUD endpoints, config files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refactoring third.&lt;/strong&gt; Point it at your worst file and say "clean this up."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex features last.&lt;/strong&gt; Only after you trust the tool and know its limits.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And always, always review the output. The agent is your fastest junior developer. It's also your most confident one — and confidence without experience is how bugs get shipped.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I write about building things with AI, self-hosting, and turning side projects into income. If you're into that, I post a new article every few days.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Running my own AI setup locally? &lt;a href="https://dev.to/samhartley/i-ditched-openai-and-run-ai-locally-for-free-heres-how-57o6"&gt;Here's how I do it for $0/month.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Expanded My GPU Rental Fleet to 6 Cards — Here's What Happened to My Earnings</title>
      <dc:creator>Sam Hartley</dc:creator>
      <pubDate>Sat, 04 Apr 2026 15:44:11 +0000</pubDate>
      <link>https://forem.com/samhartley_dev/i-expanded-my-gpu-rental-fleet-to-6-cards-heres-what-happened-to-my-earnings-10d5</link>
      <guid>https://forem.com/samhartley_dev/i-expanded-my-gpu-rental-fleet-to-6-cards-heres-what-happened-to-my-earnings-10d5</guid>
      <description>&lt;h1&gt;
  
  
  I Expanded My GPU Rental Fleet to 6 Cards — Here's What Happened to My Earnings
&lt;/h1&gt;

&lt;p&gt;A few weeks ago I wrote about renting out my single RTX 3060 on Vast.ai for passive income. The experiment worked better than I expected, so I did what any reasonable person would do: I went and dug out the five other GPUs sitting in my storage room.&lt;/p&gt;

&lt;p&gt;This is the honest follow-up. What actually happened when I went from 1 GPU to 6.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Backstory
&lt;/h2&gt;

&lt;p&gt;I had a bunch of GPUs from an older setup — two RTX 3070s, one RTX 3080, and two more RTX 3060s. They were collecting dust. The PC they came from got upgraded, the cards went into cardboard boxes, the boxes went under a shelf.&lt;/p&gt;

&lt;p&gt;Total VRAM across all six: around 62GB. Combined retail value when new: probably $3,000+. Current value sitting in boxes: $0/month.&lt;/p&gt;

&lt;p&gt;The math wasn't complicated.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Expansion Actually Took
&lt;/h2&gt;

&lt;p&gt;Here's what I underestimated: it's not just "plug cards in, profit."&lt;/p&gt;

&lt;h3&gt;
  
  
  The hardware side
&lt;/h3&gt;

&lt;p&gt;You can't just stack 6 GPUs into a regular PC case. I had to think about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PCIe slots and bandwidth.&lt;/strong&gt; A standard ATX board has maybe 2-3 real x16 slots. For 6 cards, you're looking at risers, which means a mining-style open frame or a server chassis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Power.&lt;/strong&gt; Each card pulls 150-250W under load. Six cards = potentially 1,200-1,500W just in GPU power. Plus CPU, drives, RAM. My existing 850W PSU was not going to cut it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cooling.&lt;/strong&gt; Cards in a tight case thermal-throttle each other. Open frame was the answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I ended up using an open-air mining frame I found used for cheap, two PSUs daisy-chained (a sketchy-but-common approach in the mining world), and PCIe risers.&lt;/p&gt;

&lt;p&gt;Setup time: about a full weekend.&lt;/p&gt;

&lt;h3&gt;
  
  
  The software side
&lt;/h3&gt;

&lt;p&gt;Getting all six cards recognized wasn't plug-and-play either. I run Windows on the main PC (easier driver support for NVIDIA), and Vast.ai has a Windows daemon that mostly works — except when it doesn't.&lt;/p&gt;

&lt;p&gt;A few issues I hit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Two risers were flaky and caused cards to drop off&lt;/li&gt;
&lt;li&gt;One 3070 had a driver conflict until I did a clean DDU reinstall&lt;/li&gt;
&lt;li&gt;Vast.ai's host dashboard showed 5 GPUs after setup; took me an hour to figure out the sixth wasn't being detected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total debugging time before everything was stable: another weekend.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Earnings Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;Cards&lt;/th&gt;
&lt;th&gt;VRAM&lt;/th&gt;
&lt;th&gt;Weekly Earnings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Before (1 card)&lt;/td&gt;
&lt;td&gt;RTX 3060&lt;/td&gt;
&lt;td&gt;12GB&lt;/td&gt;
&lt;td&gt;~$12-18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;After (6 cards)&lt;/td&gt;
&lt;td&gt;3060 × 3, 3070 × 2, 3080 × 1&lt;/td&gt;
&lt;td&gt;62GB&lt;/td&gt;
&lt;td&gt;~$65-95&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Not exactly linear scaling. Here's why:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demand is unpredictable.&lt;/strong&gt; Sometimes 4 of my 6 cards are rented simultaneously. Sometimes 1. The RTX 3080 gets picked up more often than the 3060s — higher VRAM matters for LLM inference jobs that need room to load bigger models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not all hours are equal.&lt;/strong&gt; Utilization spikes during US business hours and drops overnight (Turkey time). I'm in a timezone where "overnight for me" overlaps with "peak US working hours," which actually helps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing matters more than I thought.&lt;/strong&gt; I dropped my per-card price slightly and saw utilization go up noticeably. A few cents per hour makes a real difference when renters are comparing a dozen similar options.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current Monthly Run Rate
&lt;/h2&gt;

&lt;p&gt;Across all six cards, I'm averaging around &lt;strong&gt;$280-340/month&lt;/strong&gt; before electricity.&lt;/p&gt;

&lt;p&gt;Power costs are real. Six GPUs under load is serious wattage. My electricity bill went up — I haven't calculated the exact delta yet because my bill is shared (I'm not the only one using power in my building), but I'd estimate $40-60/month in additional costs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Net: roughly $220-280/month in real passive income.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Is that life-changing? No. Is it meaningful for money that was doing nothing? Absolutely.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Start with a proper open-frame rig, not a cobbled-together case.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The mining frame was cheap but took time to source. If I were doing this again I'd budget for it from day one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Get a proper high-wattage PSU setup.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Running two PSUs linked together works but it's inelegant. A server PSU with the right adapter is cleaner and safer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Test each card individually before combining them.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
I wasted time troubleshooting "which card is the problem" when I could've confirmed each one worked before building the full rig.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Set minimum job duration.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Short jobs (under an hour) rack up overhead — container spin-up time, handshaking — without much earnings. I set a minimum of 2 hours and earnings-per-hour improved.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Unexpected Part
&lt;/h2&gt;

&lt;p&gt;I expected this to be a boring passive income setup. It mostly is. But I've learned a surprising amount about how the AI inference market actually works by watching what gets rented and when.&lt;/p&gt;

&lt;p&gt;Most renters are running:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fine-tuning jobs (need sustained GPU hours)&lt;/li&gt;
&lt;li&gt;LLM inference (need VRAM more than raw compute)&lt;/li&gt;
&lt;li&gt;Image generation (FLUX, Stable Diffusion variants)&lt;/li&gt;
&lt;li&gt;Dev environments (people testing stuff without committing to a cloud contract)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Watching the demand patterns is actually interesting data about what the AI dev community is building right now. The 3080 almost always goes first — 10GB VRAM hits a sweet spot for smaller Llama and Mistral models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is It Worth It?
&lt;/h2&gt;

&lt;p&gt;Depends on your situation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Yes, if:&lt;/strong&gt; You already have the GPUs and they're sitting idle. The marginal cost of setting this up is mostly your time, and the monthly return is real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maybe, if:&lt;/strong&gt; You'd have to buy the GPUs. At current used-market prices, payback period is 6-12 months depending on utilization. That's not terrible but it's not obvious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No, if:&lt;/strong&gt; You're renting out your daily-driver GPU. The rental platform can grab your card at inconvenient times. Keep at least one card reserved for your own use.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;I'm looking at adding the Ubuntu server I have running as a CPU-only Vast.ai host for smaller workloads. Less money per unit but zero additional hardware cost.&lt;/p&gt;

&lt;p&gt;Also thinking about whether it makes sense to eventually get into the dedicated hosting side rather than the rental marketplace — more stable income, more setup required. Still researching.&lt;/p&gt;

&lt;p&gt;For now, 6 cards, ~$250/month net, and a weekend's worth of setup. I'll take it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Questions about the rig setup or Vast.ai specifics? Drop them in the comments.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;→ &lt;a href="http://www.fiverr.com/s/XLyg" rel="noopener noreferrer"&gt;Check out my automation work on Fiverr&lt;/a&gt;&lt;br&gt;&lt;br&gt;
→ &lt;a href="https://t.me/celebibot_en" rel="noopener noreferrer"&gt;Follow along on Telegram&lt;/a&gt;&lt;/p&gt;

</description>
      <category>gpu</category>
      <category>passiveincome</category>
      <category>selfhosted</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Use Telegram as My DevOps Dashboard — No Web UI, No VPN, Just Works</title>
      <dc:creator>Sam Hartley</dc:creator>
      <pubDate>Mon, 23 Mar 2026 08:02:32 +0000</pubDate>
      <link>https://forem.com/samhartley_dev/i-use-telegram-as-my-devops-dashboard-no-web-ui-no-vpn-just-works-10bn</link>
      <guid>https://forem.com/samhartley_dev/i-use-telegram-as-my-devops-dashboard-no-web-ui-no-vpn-just-works-10bn</guid>
      <description>&lt;p&gt;I have a bunch of things running 24/7 on a Mac Mini. GPU rental jobs, a Garmin watch face updater, a Fiverr inbox monitor, a funding rate tracker, a few cron jobs. &lt;/p&gt;

&lt;p&gt;For a while I ran a Grafana dashboard to keep an eye on them. It looked impressive. I never opened it.&lt;/p&gt;

&lt;p&gt;What I actually do is check my phone. So I built the monitoring layer there.&lt;/p&gt;

&lt;p&gt;Here's the setup: a lightweight Telegram bot that serves as my entire DevOps interface. Status checks, alerts, and even simple commands — all from the Telegram app I already have open.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Not a Proper Dashboard?
&lt;/h2&gt;

&lt;p&gt;Honest answer: dashboards are for teams. If you're a solo dev with a few projects, a fancy web UI creates more overhead than it solves.&lt;/p&gt;

&lt;p&gt;Problems I had with Grafana:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VPN required to reach it from outside my home network&lt;/li&gt;
&lt;li&gt;Needs to stay running (another thing to maintain)&lt;/li&gt;
&lt;li&gt;I never actually opened the browser tab&lt;/li&gt;
&lt;li&gt;It didn't &lt;em&gt;push&lt;/em&gt; me information — I had to pull it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Telegram flips this: it pushes alerts to me. I glance at my phone, see what's happening, and move on.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Services (cron jobs, Python scripts, shell scripts)
  ↓
Central alert script: notify.sh
  ↓
Telegram Bot API → my phone
  ↓ (optional)
Command bot → runs queries on server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two pieces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Outbound alerts&lt;/strong&gt; — services send me messages when things happen&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inbound commands&lt;/strong&gt; — I can ask the bot questions from my phone&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Part 1: Dead Simple Alert Script
&lt;/h2&gt;

&lt;p&gt;Every service on my server can call this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# notify.sh — send a Telegram message from any script&lt;/span&gt;
&lt;span class="c"&gt;# Usage: ./notify.sh "Your GPU job finished"&lt;/span&gt;

&lt;span class="nv"&gt;BOT_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_bot_token_here"&lt;/span&gt;
&lt;span class="nv"&gt;CHAT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_chat_id_here"&lt;/span&gt;
&lt;span class="nv"&gt;MESSAGE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"https://api.telegram.org/bot&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BOT_TOKEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/sendMessage"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nv"&gt;chat_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHAT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nv"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;MESSAGE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nv"&gt;parse_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"HTML"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Any script can now send me a message in one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./notify.sh &lt;span class="s2"&gt;"✅ GPU rental job completed — earned &lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;&lt;span class="s2"&gt;.40"&lt;/span&gt;
./notify.sh &lt;span class="s2"&gt;"⚠️ Funding rate dropped below threshold on LYN_USDT"&lt;/span&gt;
./notify.sh &lt;span class="s2"&gt;"📬 New Fiverr inquiry from user987"&lt;/span&gt;
./notify.sh &lt;span class="s2"&gt;"❌ Garmin watch face API returned 503"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I spent maybe 20 minutes on this. It replaced a monitoring stack I spent days configuring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Examples from My Setup
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GPU rental monitor:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Runs every 30 min&lt;/span&gt;
&lt;span class="nv"&gt;earnings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;./check_gpu_earnings.sh&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$earnings&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-gt&lt;/span&gt; &lt;span class="s2"&gt;"0"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
  ./notify.sh &lt;span class="s2"&gt;"💰 GPU earned: &lt;/span&gt;&lt;span class="nv"&gt;$earnings&lt;/span&gt;&lt;span class="s2"&gt; today"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Funding rate watcher:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Python script, runs every 15 min via cron
&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_funding_rate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LYN_USDT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# negative = people paying longs
&lt;/span&gt;    &lt;span class="nf"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🔥 LYN funding rate: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;% — worth checking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Daily summary (9 AM cron):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nv"&gt;msg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"📊 Daily Summary — &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y-%m-%d&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;

GPU Jobs: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;get_gpu_count&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; completed
Funding Earned: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;get_funding_total&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;
Fiverr Inquiries: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;get_fiverr_count&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;
Watch Face Updates: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;get_garmin_count&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;

Server uptime: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;uptime&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

./notify.sh &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$msg&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I wake up, check my phone, and immediately know if anything needs attention. No browser, no VPN, no dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2: The Command Interface
&lt;/h2&gt;

&lt;p&gt;Outbound alerts are great. But sometimes I want to query the server from my phone.&lt;/p&gt;

&lt;p&gt;I wrote a simple Python bot that listens for commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;telebot&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;

&lt;span class="n"&gt;BOT_TOKEN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;ALLOWED_USER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;123456789&lt;/span&gt;  &lt;span class="c1"&gt;# your Telegram user ID
&lt;/span&gt;
&lt;span class="n"&gt;bot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;telebot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TeleBot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BOT_TOKEN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;COMMANDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uptime &amp;amp;&amp;amp; free -h &amp;amp;&amp;amp; df -h /&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/gpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./check_gpu_status.sh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/funding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python3 check_funding_rates.py --summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/services&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ps aux | grep -E &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(python|node|ollama)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; | grep -v grep&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@bot.message_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;commands&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COMMANDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;ALLOWED_USER&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;bot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Not authorized.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="n"&gt;cmd_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# handle /status@botname format
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cmd_text&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;COMMANDS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;COMMANDS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;cmd_text&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
            &lt;span class="n"&gt;shell&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;bot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;```
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
```&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parse_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;bot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;polling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;none_stop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now from Telegram I can type &lt;code&gt;/status&lt;/code&gt; and get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt; 11:23:15 up 14 days, 3:41,  1 user
Mem:   16Gi   8.2Gi   7.8Gi
/dev/sda1        245G   82G  163G  34%
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or &lt;code&gt;/funding&lt;/code&gt; and get the current rate snapshot. &lt;/p&gt;

&lt;p&gt;The key detail: &lt;code&gt;ALLOWED_USER&lt;/code&gt; check. Only my Telegram ID can run commands. Everyone else gets "Not authorized." Bot tokens are public in the sense that anyone can message your bot — you need to validate the sender.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping It Running
&lt;/h2&gt;

&lt;p&gt;The command bot needs to stay alive. I use a simple systemd service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[Unit]&lt;/span&gt;
&lt;span class="py"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;Telegram DevOps Bot&lt;/span&gt;
&lt;span class="py"&gt;After&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;network.target&lt;/span&gt;

&lt;span class="nn"&gt;[Service]&lt;/span&gt;
&lt;span class="py"&gt;ExecStart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/usr/bin/python3 /home/user/telegram-bot/bot.py&lt;/span&gt;
&lt;span class="py"&gt;Restart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;always&lt;/span&gt;
&lt;span class="py"&gt;RestartSec&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;

&lt;span class="nn"&gt;[Install]&lt;/span&gt;
&lt;span class="py"&gt;WantedBy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;multi-user.target&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;systemctl enable telegram-bot &amp;amp;&amp;amp; systemctl start telegram-bot&lt;/code&gt; — and it survives reboots.&lt;/p&gt;

&lt;p&gt;On macOS (my setup) I use a launchd plist, same concept.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Get Alerts For
&lt;/h2&gt;

&lt;p&gt;Not everything. Alert fatigue is real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alert on:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Job completions (GPU task done, funding cycle closed)&lt;/li&gt;
&lt;li&gt;❌ Errors that need action&lt;/li&gt;
&lt;li&gt;📬 New customer inquiries (Fiverr inbox)&lt;/li&gt;
&lt;li&gt;⚠️ Thresholds crossed (rate drops, disk usage, memory spikes)&lt;/li&gt;
&lt;li&gt;📊 Daily summaries (once a day, morning)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Silence:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Routine successful runs (no news is good news)&lt;/li&gt;
&lt;li&gt;Health checks that pass&lt;/li&gt;
&lt;li&gt;Regular cron completions with no anomalies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is: every message I receive from the bot is something I actually care about. If I'm ignoring 80% of notifications, I'm alerting on the wrong things.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Cost
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Telegram Bot API: &lt;strong&gt;free&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;curl&lt;/code&gt; command: comes with your OS&lt;/li&gt;
&lt;li&gt;Python + telebot library: &lt;strong&gt;free&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Running this bot: negligible CPU, ~20MB RAM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My entire monitoring setup costs $0/month and runs on the same Mac Mini as everything else. No SaaS, no cloud logging, no dashboard subscription.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Months Later
&lt;/h2&gt;

&lt;p&gt;I send about 15-20 alerts per day. Daily summary at 9 AM, event-driven messages the rest of the day. I check my phone, see green checkmarks and earnings summaries, and know the server is doing its job.&lt;/p&gt;

&lt;p&gt;The one time the GPU host went offline, I got a message within 5 minutes. Fixed it from my phone during lunch.&lt;/p&gt;

&lt;p&gt;That's the whole point: not more tooling, just the right interface for how I actually work.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Got a monitoring setup you like? Drop it in the comments — always curious what others are running.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;→ &lt;a href="http://www.fiverr.com/s/XLyg" rel="noopener noreferrer"&gt;I build these kinds of automation setups on Fiverr&lt;/a&gt;&lt;br&gt;
→ &lt;a href="https://t.me/celebibot_en" rel="noopener noreferrer"&gt;Follow CelebiBots on Telegram&lt;/a&gt;&lt;/p&gt;

</description>
      <category>telegram</category>
      <category>devops</category>
      <category>selfhosted</category>
      <category>automation</category>
    </item>
    <item>
      <title>I Rented Out My GPU for Passive Income — Here’s What Happened After My First Week</title>
      <dc:creator>Sam Hartley</dc:creator>
      <pubDate>Sat, 21 Mar 2026 08:02:08 +0000</pubDate>
      <link>https://forem.com/samhartley_dev/i-rented-out-my-gpu-for-passive-income-heres-what-happened-after-my-first-week-2689</link>
      <guid>https://forem.com/samhartley_dev/i-rented-out-my-gpu-for-passive-income-heres-what-happened-after-my-first-week-2689</guid>
      <description>&lt;p&gt;I had an RTX 3060 sitting on a shelf.&lt;/p&gt;

&lt;p&gt;Not broken. Not old. Just... not doing anything. My Windows PC runs models when I need them, but most of the time it's idle. The fans spin, the power draw ticks along, and that 12GB of VRAM just sits there.&lt;/p&gt;

&lt;p&gt;A week ago I connected it to &lt;a href="https://vast.ai" rel="noopener noreferrer"&gt;Vast.ai&lt;/a&gt; — a GPU marketplace where people rent compute time. No code required. You install a daemon, set a price, and wait for someone to rent your machine.&lt;/p&gt;

&lt;p&gt;Here's what actually happened.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Didn't Just Mine Crypto
&lt;/h2&gt;

&lt;p&gt;First thing people ask: "Why not just mine?"&lt;/p&gt;

&lt;p&gt;Short answer: it's 2026, the margins are brutal, and I didn't want to deal with it. GPU compute rental is different — you're renting raw processing power, and the demand right now is AI inference and training. People building LLMs, running diffusion models, doing batch jobs.&lt;/p&gt;

&lt;p&gt;The upside: no mining pool setup, no daily coin price anxiety, no special software. Your machine runs Docker containers, gets paid per second of use, you get a payout.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup (Genuinely About 90 Minutes)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Created a Vast.ai account&lt;/li&gt;
&lt;li&gt;Installed the host daemon on Windows (it's a one-click installer)&lt;/li&gt;
&lt;li&gt;Set my RTX 3060 12GB at &lt;strong&gt;$0.15/hour&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Went to bed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it. No configuration rabbit holes, no drivers to hunt down. The daemon manages everything — spinning up containers, cleaning up after renters, reporting uptime.&lt;/p&gt;

&lt;p&gt;I set the minimum rental duration to 1 hour so I wouldn't get hit with a dozen 5-minute jobs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The First Week Numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Day&lt;/th&gt;
&lt;th&gt;Hours Rented&lt;/th&gt;
&lt;th&gt;Earnings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Day 1&lt;/td&gt;
&lt;td&gt;3.2h&lt;/td&gt;
&lt;td&gt;$0.48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Day 2&lt;/td&gt;
&lt;td&gt;11.5h&lt;/td&gt;
&lt;td&gt;$1.73&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Day 3&lt;/td&gt;
&lt;td&gt;0h&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Day 4&lt;/td&gt;
&lt;td&gt;16.8h&lt;/td&gt;
&lt;td&gt;$2.52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Day 5&lt;/td&gt;
&lt;td&gt;9.1h&lt;/td&gt;
&lt;td&gt;$1.37&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Day 6&lt;/td&gt;
&lt;td&gt;22.0h&lt;/td&gt;
&lt;td&gt;$3.30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Day 7&lt;/td&gt;
&lt;td&gt;14.4h&lt;/td&gt;
&lt;td&gt;$2.16&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Week 1 total: ~$11.56&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Annualized naively? About $600/year. Which would be great except day 3 was $0 and utilization is inconsistent.&lt;/p&gt;

&lt;p&gt;A more realistic steady-state: &lt;strong&gt;$50–130/month&lt;/strong&gt; depending on demand.&lt;/p&gt;




&lt;h2&gt;
  
  
  What People Actually Rent It For
&lt;/h2&gt;

&lt;p&gt;Vast.ai shows you the jobs (anonymized). Mine has been used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Running &lt;code&gt;vllm&lt;/code&gt; inference servers (Mistral, Qwen, LLaMA variants)&lt;/li&gt;
&lt;li&gt;Stable Diffusion batch jobs&lt;/li&gt;
&lt;li&gt;Some kind of PyTorch training run that lasted 8 hours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 3060 with 12GB VRAM is actually sweet for inference — fits most 7B–13B models at 4-bit quantization without breaking a sweat. It's not the fastest card, but it's affordable to rent, which means demand is there.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Honest Downsides
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;You can't use your GPU while it's rented.&lt;/strong&gt; Sounds obvious, but the practical implication: if you need your machine for local inference and someone's rented it, tough luck. I started routing heavy tasks to my Mac Mini during rental periods.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Electricity.&lt;/strong&gt; My RTX 3060 at load pulls about 150W. At Turkish electricity rates, that's roughly $8–15/month in power at typical utilization. So the net is lower than the gross numbers above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's genuinely passive but not predictable.&lt;/strong&gt; Day 3 was $0. Day 6 was near-full utilization. There's no way to forecast demand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Payouts have a minimum.&lt;/strong&gt; Vast.ai pays out once you hit a threshold. Nothing to worry about, just something to know going in.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'm Going to Try Next
&lt;/h2&gt;

&lt;p&gt;The obvious play is adding more GPUs. I have a few more in storage — an RTX 3080 and some older 3060s. If I rack those up, the math gets interesting:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;Rate&lt;/th&gt;
&lt;th&gt;Monthly (50% util)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3060 (current)&lt;/td&gt;
&lt;td&gt;$0.15/h&lt;/td&gt;
&lt;td&gt;~$54&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3080 10GB&lt;/td&gt;
&lt;td&gt;$0.20/h&lt;/td&gt;
&lt;td&gt;~$72&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2x RTX 3070 8GB&lt;/td&gt;
&lt;td&gt;$0.16/h&lt;/td&gt;
&lt;td&gt;~$115&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's ~$240/month without doing anything after setup. At 70% utilization: ~$340.&lt;/p&gt;

&lt;p&gt;The real work is physical — pulling GPUs from storage, getting them into a rig, managing thermals. But the software side is almost zero maintenance.&lt;/p&gt;




&lt;h2&gt;
  
  
  Should You Try This?
&lt;/h2&gt;

&lt;p&gt;If you have a spare GPU collecting dust: &lt;strong&gt;yes, probably.&lt;/strong&gt; The setup is low friction, the risk is near-zero (worst case, you uninstall the daemon and move on), and even modest earnings beat $0.&lt;/p&gt;

&lt;p&gt;If you're thinking about buying a GPU specifically for this: &lt;strong&gt;do the math carefully.&lt;/strong&gt; At current rates, an RTX 3060 costs ~$300–350 used. Payback period at $50/month is 6–7 months, which is fine — but don't expect to fund your retirement from a single card.&lt;/p&gt;

&lt;p&gt;The real value for me isn't the income (yet). It's that I now have a system running, I understand the demand patterns, and I know the path to scale looks viable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Platform: &lt;a href="https://vast.ai" rel="noopener noreferrer"&gt;Vast.ai&lt;/a&gt; (there's also RunPod if you want alternatives)&lt;/li&gt;
&lt;li&gt;Time to set up: ~90 minutes&lt;/li&gt;
&lt;li&gt;Technical skill required: Know how to install software on Windows&lt;/li&gt;
&lt;li&gt;Ongoing maintenance: Almost none&lt;/li&gt;
&lt;li&gt;Realistic earnings: $50–130/month per mid-tier GPU&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy to answer questions if you try this and run into something weird. The daemon is pretty solid but there's always an edge case or two.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I write about running AI locally, automation side projects, and occasionally making money from hardware that would otherwise just collect dust. If any of this is useful, feel free to follow.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>gpu</category>
      <category>passiveincome</category>
      <category>ai</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>Local LLMs vs Cloud APIs — A Real Cost Comparison (2026)</title>
      <dc:creator>Sam Hartley</dc:creator>
      <pubDate>Thu, 19 Mar 2026 08:03:20 +0000</pubDate>
      <link>https://forem.com/samhartley_dev/local-llms-vs-cloud-apis-a-real-cost-comparison-2026-2igh</link>
      <guid>https://forem.com/samhartley_dev/local-llms-vs-cloud-apis-a-real-cost-comparison-2026-2igh</guid>
      <description>&lt;p&gt;"Just use ChatGPT" — sure, until your API bill hits $500/month.&lt;/p&gt;

&lt;p&gt;I've been running both local and cloud AI for over a year. Here are the real numbers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Test Setup
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cloud:&lt;/strong&gt; OpenAI GPT-4o, Anthropic Claude Sonnet, Google Gemini Pro&lt;br&gt;
&lt;strong&gt;Local:&lt;/strong&gt; Ollama with Qwen 3.5 9B (Mac Mini M4) + Qwen 3 Coder 30B (RTX 3060)&lt;/p&gt;

&lt;p&gt;Workload: ~500 queries/day — code review, content generation, customer support, data analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monthly Cloud API Costs
&lt;/h2&gt;

&lt;p&gt;For 500 queries/day:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI GPT-4o (200 queries): ~$90/month&lt;/li&gt;
&lt;li&gt;Anthropic Claude Sonnet (200 queries): ~$72/month&lt;/li&gt;
&lt;li&gt;Google Gemini Pro (100 queries): ~$25/month&lt;/li&gt;
&lt;li&gt;Total: ~$187/month&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Monthly Local Setup Costs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Mac Mini M4 (already owned): $0&lt;/li&gt;
&lt;li&gt;RTX 3060 12GB (used, eBay): $150 one-time&lt;/li&gt;
&lt;li&gt;Electricity 24/7: ~$12/month&lt;/li&gt;
&lt;li&gt;Total: ~$12/month ongoing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Break-even: less than 1 month.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quality Comparison (What Surprised Me)
&lt;/h2&gt;

&lt;p&gt;For 80% of daily tasks, local models are good enough:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;General chat: Qwen 3.5 9B is roughly GPT-4o quality (~90%)&lt;/li&gt;
&lt;li&gt;Code generation: Qwen 3 Coder 30B is close to Claude Sonnet (~85-90%)&lt;/li&gt;
&lt;li&gt;Simple Q&amp;amp;A and extraction: any 7B model matches cloud (~95%+)&lt;/li&gt;
&lt;li&gt;Complex multi-step reasoning: cloud still wins here&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Hybrid Approach I Use
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User query
  -&amp;gt; Simple? (Q&amp;amp;A, formatting, extraction)
       -&amp;gt; Local Qwen 3.5 9B  (free, instant)
  -&amp;gt; Code-heavy?
       -&amp;gt; Local Qwen 3 Coder 30B  (free, ~12s)
  -&amp;gt; Complex reasoning?
       -&amp;gt; Cloud Claude Sonnet  ($0.003-0.015 per query)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: cloud costs dropped from ~$187/month to ~$25/month.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hidden Costs of Cloud
&lt;/h2&gt;

&lt;p&gt;Things people forget:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Rate limits — hit the ceiling during a deadline? Too bad.&lt;/li&gt;
&lt;li&gt;Latency — 500-2000ms per request vs 100-500ms local&lt;/li&gt;
&lt;li&gt;Privacy — your code and data live on someone else's server&lt;/li&gt;
&lt;li&gt;Vendor lock-in — OpenAI changes pricing, you're stuck&lt;/li&gt;
&lt;li&gt;Downtime — their outage = your workflow stops&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Hidden Costs of Local
&lt;/h2&gt;

&lt;p&gt;Being fair:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Initial hardware — $150-500 for a GPU (pays off in under a month)&lt;/li&gt;
&lt;li&gt;Setup time — 30 minutes with Ollama these days&lt;/li&gt;
&lt;li&gt;Storage — models are 4-40GB each&lt;/li&gt;
&lt;li&gt;Power — $10-15/month for 24/7 operation&lt;/li&gt;
&lt;li&gt;No frontier models — you won't run GPT-4 locally yet&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Getting Started in 10 Minutes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Pull a model&lt;/span&gt;
ollama pull qwen3.5:9b

&lt;span class="c"&gt;# Start chatting&lt;/span&gt;
ollama run qwen3.5:9b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total time: 10 minutes. Total cost: $0.&lt;/p&gt;




&lt;p&gt;Need help setting up a local AI server? I do this professionally.&lt;/p&gt;

&lt;p&gt;Follow along: &lt;a href="https://t.me/celebibot_en" rel="noopener noreferrer"&gt;Telegram @celebibot_en&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Sam Hartley — building AI things that actually work.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>selfhosted</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How I Automated My Entire Dev Workflow with AI Agents (Running 24/7 on a Mac Mini)</title>
      <dc:creator>Sam Hartley</dc:creator>
      <pubDate>Sun, 15 Mar 2026 08:01:19 +0000</pubDate>
      <link>https://forem.com/samhartley_dev/how-i-automated-my-entire-dev-workflow-with-ai-agents-running-247-on-a-mac-mini-3pc1</link>
      <guid>https://forem.com/samhartley_dev/how-i-automated-my-entire-dev-workflow-with-ai-agents-running-247-on-a-mac-mini-3pc1</guid>
      <description>&lt;p&gt;I used to spend 3 hours a day on repetitive tasks. Now an AI agent handles them while I sleep.&lt;/p&gt;

&lt;p&gt;This isn't a concept post — this setup is running right now on a Mac Mini in my home office. Here's the full picture.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Gets Automated
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Before (Manual)&lt;/th&gt;
&lt;th&gt;After (AI Agent)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Email triage&lt;/td&gt;
&lt;td&gt;30 min/day&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code review&lt;/td&gt;
&lt;td&gt;45 min/day&lt;/td&gt;
&lt;td&gt;12 seconds per PR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customer inbox check&lt;/td&gt;
&lt;td&gt;20 min/day&lt;/td&gt;
&lt;td&gt;Hourly cron job&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Calendar reminders&lt;/td&gt;
&lt;td&gt;Forget constantly&lt;/td&gt;
&lt;td&gt;Proactive alerts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weather check&lt;/td&gt;
&lt;td&gt;Open app manually&lt;/td&gt;
&lt;td&gt;Agent tells me before I leave&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Market data&lt;/td&gt;
&lt;td&gt;Check 5 websites&lt;/td&gt;
&lt;td&gt;Live on my watch face&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Total time saved: ~2.5 hours/day.&lt;/strong&gt; That's 75 hours a month I got back.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Mac Mini M4 (always on)
├── AI Agent (orchestrator)
│   ├── Email checker (hourly)
│   ├── Calendar scanner (every 4 hours)
│   ├── Inbox monitor (Fiverr/orders, every hour)
│   └── Market data fetcher (every 15 min)
├── Ollama (local LLM — free inference)
│   ├── Qwen 3.5 9B (general tasks)
│   └── Qwen 3 Coder 30B (via network GPU)
├── Notification layer
│   └── Telegram bot → my phone
└── Background services
    ├── Garmin watch face data feed
    └── System health monitoring
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Total hardware cost:&lt;/strong&gt; $0 additional (Mac Mini was already there).&lt;br&gt;
&lt;strong&gt;Monthly API cost:&lt;/strong&gt; ~$25 (only complex reasoning hits the cloud).&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Design Principles
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Don't Automate Everything
&lt;/h3&gt;

&lt;p&gt;The 80/20 rule applies hard:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Automate: Checking, monitoring, formatting, reminders&lt;/li&gt;
&lt;li&gt;❌ Don't automate: Decisions that need judgment, creative work, human relationships&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I tried automating replies to clients once. Deleted it after day one. Some things need a human touch.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Alert, Don't Act
&lt;/h3&gt;

&lt;p&gt;My agent &lt;strong&gt;tells me&lt;/strong&gt; when something needs attention — it doesn't reply to customers, send emails, or make purchases.&lt;/p&gt;

&lt;p&gt;One wrong automated email can destroy a client relationship. AI is excellent at detection. It's not great at nuance.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Local First, Cloud Fallback
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;query_is_simple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Free, instant, private
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3.5:9b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Rare — complex reasoning only
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;claude&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;95% of automation tasks are simple: "Is there a new message? What does it say? Is it urgent?"&lt;/p&gt;

&lt;p&gt;A 9B model handles that perfectly at zero cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Fail Silently, Alert Loudly
&lt;/h3&gt;

&lt;p&gt;If the weather API is down → no notification (who cares).&lt;br&gt;
If a paying customer messages → &lt;strong&gt;instant alert to my phone&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not all failures are equal. Treat them that way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Example: The Inbox Monitor
&lt;/h2&gt;

&lt;p&gt;Every hour, the agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Opens the browser → inbox&lt;/li&gt;
&lt;li&gt;Scans for new messages&lt;/li&gt;
&lt;li&gt;Checks against known spam/scam patterns (regex + AI)&lt;/li&gt;
&lt;li&gt;If legitimate: sends me a Telegram notification with the preview&lt;/li&gt;
&lt;li&gt;If junk: logs it and moves on quietly&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Scam detection rate:&lt;/strong&gt; 100% so far (pattern matching + LLM analysis).&lt;br&gt;
&lt;strong&gt;False positives:&lt;/strong&gt; 0 (I still review every alert manually before responding).&lt;br&gt;
&lt;strong&gt;Time saved:&lt;/strong&gt; 20 min/day × 30 days = &lt;strong&gt;10 hours/month&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Stack (All Free or Open Source)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM inference&lt;/td&gt;
&lt;td&gt;Ollama + Qwen 3.5&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;Cron jobs + Python scripts&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Notifications&lt;/td&gt;
&lt;td&gt;Telegram Bot API&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Email access&lt;/td&gt;
&lt;td&gt;Apple Mail + CLI bridge&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Calendar&lt;/td&gt;
&lt;td&gt;EventKit (macOS)&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser automation&lt;/td&gt;
&lt;td&gt;Safari + accessibility APIs&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud reasoning (10%)&lt;/td&gt;
&lt;td&gt;Anthropic API&lt;/td&gt;
&lt;td&gt;~$25/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Monthly total: ~$25.&lt;/strong&gt; Before this setup I was paying $150+ just in API costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Started (Without Going Crazy)
&lt;/h2&gt;

&lt;p&gt;I didn't build this in a weekend. Here's the honest timeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Week 1:&lt;/strong&gt; Just the email checker. One script, one cron job.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Week 2:&lt;/strong&gt; Added Telegram notifications. Suddenly I could see it working.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Week 3:&lt;/strong&gt; Calendar alerts. This was a game changer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Month 2:&lt;/strong&gt; Inbox monitoring, market data feed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Month 3:&lt;/strong&gt; Everything orchestrated together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The lesson: &lt;strong&gt;start with one automation, get it reliable, then add the next.&lt;/strong&gt; Trying to build everything at once is how you end up with a half-working mess you don't trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons From Running This in Production
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Logging is everything.&lt;/strong&gt; When something fails at 3 AM, logs are all you have.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cron intervals matter.&lt;/strong&gt; Checking email every minute wastes resources. Every hour is fine for most things.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI doesn't replace thinking&lt;/strong&gt; — it replaces the mechanical parts of thinking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test on your actual workflow&lt;/strong&gt; before scaling. My setup works for me; yours will look different.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The value isn't speed — it's offloading mental overhead.&lt;/strong&gt; I stopped worrying about forgetting to check things.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;I'm building a Telegram-based version of this that others can use — subscribe to signals, set their own alert rules, powered by the same local AI stack. If that sounds interesting, follow along.&lt;/p&gt;

&lt;p&gt;Have you automated parts of your workflow? What was the first thing you tackled? Drop it in the comments — I'm genuinely curious what others are automating.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you want to set something like this up for your own workflow or business, I do this as a service too: &lt;a href="http://www.fiverr.com/s/XLyg" rel="noopener noreferrer"&gt;Custom AI Automation Workflows&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Daily automation tips: &lt;a href="https://t.me/celebibot_en" rel="noopener noreferrer"&gt;t.me/celebibot_en&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I Built a Personal AI That Actually Knows My Projects (RAG + Ollama, Zero Cloud)</title>
      <dc:creator>Sam Hartley</dc:creator>
      <pubDate>Fri, 13 Mar 2026 08:01:50 +0000</pubDate>
      <link>https://forem.com/samhartley_dev/i-built-a-personal-ai-that-actually-knows-my-projects-rag-ollama-zero-cloud-3afo</link>
      <guid>https://forem.com/samhartley_dev/i-built-a-personal-ai-that-actually-knows-my-projects-rag-ollama-zero-cloud-3afo</guid>
      <description>&lt;p&gt;I got tired of explaining my own codebase to an AI every single session.&lt;/p&gt;

&lt;p&gt;"Here's the architecture. Here's the README. Here's what I tried last time." Every. Single. Time.&lt;/p&gt;

&lt;p&gt;So I built a local RAG (Retrieval-Augmented Generation) system that knows my projects, my notes, and my docs — permanently. No cloud. No API costs. No context window resets.&lt;/p&gt;

&lt;p&gt;Here's exactly how it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Context Windows
&lt;/h2&gt;

&lt;p&gt;LLMs don't remember. You paste the same 200 lines of context every session, hit the token limit, and start over. It's fine for one-off questions. It's exhausting for ongoing projects.&lt;/p&gt;

&lt;p&gt;The standard solution is RAG: instead of stuffing everything into the prompt, you store docs in a vector database and &lt;strong&gt;retrieve only the relevant chunks&lt;/strong&gt; when you ask a question. The model sees 3-5 paragraphs of targeted context instead of your entire repo.&lt;/p&gt;

&lt;p&gt;Result: faster, cheaper, and the AI actually answers the right question.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your Documents (markdown, code, PDFs, notes)
  → Chunked + embedded (Ollama nomic-embed-text)
  → Stored in Chroma (local vector DB)

Query
  → Embedded (same model)
  → Top-5 relevant chunks retrieved
  → Stuffed into Ollama prompt (Qwen 3.5 9B)
  → Answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Zero cloud. Zero API keys. Runs on a Mac Mini or any machine with 8GB RAM.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Index
&lt;/h2&gt;

&lt;p&gt;Everything that would normally eat my context window:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Project READMEs and architecture docs&lt;/li&gt;
&lt;li&gt;My personal notes (Obsidian vault)&lt;/li&gt;
&lt;li&gt;Code snippets and past solutions&lt;/li&gt;
&lt;li&gt;API documentation I use regularly&lt;/li&gt;
&lt;li&gt;Stack Overflow answers I bookmarked (because I always forget them again)&lt;/li&gt;
&lt;li&gt;Config files and deployment notes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total indexed: ~4,800 chunks. Query time: under 2 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Install the Stack (15 minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Ollama (already installed? skip)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Pull models&lt;/span&gt;
ollama pull qwen3.5:9b          &lt;span class="c"&gt;# LLM for answers&lt;/span&gt;
ollama pull nomic-embed-text    &lt;span class="c"&gt;# Embedding model&lt;/span&gt;

&lt;span class="c"&gt;# Python dependencies&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;chromadb langchain ollama pypdf markdown
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire stack. No Docker required (though Chroma has a Docker option if you want a persistent server).&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Index Your Documents
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DirectoryLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.embeddings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OllamaEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;

&lt;span class="c1"&gt;# Load your docs folder
&lt;/span&gt;&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DirectoryLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;~/projects/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**/*.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recursive&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Split into chunks (400 tokens, 50 overlap)
&lt;/span&gt;&lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Indexed &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; chunks from &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Embed and store locally
&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OllamaEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nomic-embed-text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;persist_directory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./my-knowledge-base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;persist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run once. Done. Your docs are now searchable by meaning, not just keywords.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Query It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.embeddings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OllamaEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;

&lt;span class="c1"&gt;# Load existing DB
&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OllamaEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nomic-embed-text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Chroma&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;persist_directory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./my-knowledge-base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding_function&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Retrieve top 5 relevant chunks
&lt;/span&gt;    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Query local LLM with context
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3.5:9b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Based on this context:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Answer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Example
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How does my Garmin watch face fetch stock data?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the API rate limit for the crypto bot?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How do I deploy the Telegram bot to the VPS?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real answers from your own documentation. No hallucinations about your specific setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Killer Feature: Incremental Updates
&lt;/h2&gt;

&lt;p&gt;Don't re-index everything when one file changes. Just update what's new:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_file_hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read_bytes&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index_cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./index-cache.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_cache&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_cache&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="n"&gt;changed_files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs_dir&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;rglob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_file_hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;changed_files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;changed_files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Re-indexing &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;changed_files&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; changed files...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# [load, chunk, embed, upsert only changed files]
&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_cache&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run this as a cron job every hour. Your knowledge base stays current automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Changed for Me
&lt;/h2&gt;

&lt;p&gt;Before RAG:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Explain the background service memory limit in my Garmin project" → paste 200 lines → wait → answer&lt;/li&gt;
&lt;li&gt;Every new chat session: context reset, start explaining again&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After RAG:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ask("Garmin background service memory limit")&lt;/code&gt; → &lt;strong&gt;"64KB sandbox, pass data via Background.exit(dictionary)"&lt;/strong&gt; — in 1.8 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My LLM now answers questions about projects I haven't touched in 6 months. No context management. No pasting. Just ask.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardware Requirements
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;th&gt;Embedding Speed&lt;/th&gt;
&lt;th&gt;Query Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mac Mini M4 8GB&lt;/td&gt;
&lt;td&gt;8GB&lt;/td&gt;
&lt;td&gt;~500 docs/min&lt;/td&gt;
&lt;td&gt;~2s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3060 12GB&lt;/td&gt;
&lt;td&gt;12GB VRAM&lt;/td&gt;
&lt;td&gt;~3000 docs/min&lt;/td&gt;
&lt;td&gt;~0.5s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Old laptop 8GB&lt;/td&gt;
&lt;td&gt;8GB&lt;/td&gt;
&lt;td&gt;~100 docs/min&lt;/td&gt;
&lt;td&gt;~5-8s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The embedding step (indexing) is the slow part — run it once, then it's instant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tips from Running This for 3 Months
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Chunk size matters&lt;/strong&gt; — 400 tokens works well for prose and docs. For code, try 200 with more overlap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata is your friend&lt;/strong&gt; — store &lt;code&gt;filename&lt;/code&gt; and &lt;code&gt;section&lt;/code&gt; in chunk metadata. When the AI says "see the deployment notes," you know exactly where to look.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-rank when accuracy matters&lt;/strong&gt; — if top-5 chunks aren't enough, add a re-ranker step (Cohere has a free API, or use a local cross-encoder).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch your embed model&lt;/strong&gt; — &lt;code&gt;nomic-embed-text&lt;/code&gt; beats most larger models for RAG. Don't use your chat LLM for embeddings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid search&lt;/strong&gt; — combine vector search with BM25 keyword search for better results on technical queries with specific names/functions.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;This is step one of something bigger: a personal AI that grows with your projects instead of resetting every session.&lt;/p&gt;

&lt;p&gt;Next phase I'm building: automatic indexing from Git commits (index diffs in real-time as you code) + a simple web UI for non-terminal queries.&lt;/p&gt;

&lt;p&gt;Total current cost of this setup: &lt;strong&gt;$0/month&lt;/strong&gt;. It runs on the same Mac Mini I already had.&lt;/p&gt;




&lt;h2&gt;
  
  
  Want This for Your Business?
&lt;/h2&gt;

&lt;p&gt;RAG systems are one of the highest-value AI implementations you can build. A local knowledge base trained on your company's docs, SOPs, and code can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Answer customer support questions without hallucinating&lt;/li&gt;
&lt;li&gt;Help your team find answers instantly across years of internal docs&lt;/li&gt;
&lt;li&gt;Run 24/7 with zero ongoing API costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;→ &lt;a href="http://www.fiverr.com/s/qD1V" rel="noopener noreferrer"&gt;Custom RAG setup on Fiverr&lt;/a&gt;&lt;br&gt;
→ &lt;a href="https://t.me/celebibot_en" rel="noopener noreferrer"&gt;Follow us on Telegram&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with 🤖 by CelebiBots — AI that runs on your terms.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>ollama</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>5 Telegram Bot Ideas That Actually Make Money (Built a Few of These Myself)</title>
      <dc:creator>Sam Hartley</dc:creator>
      <pubDate>Wed, 11 Mar 2026 08:01:24 +0000</pubDate>
      <link>https://forem.com/samhartley_dev/5-telegram-bot-ideas-that-actually-make-money-built-a-few-of-these-myself-46eh</link>
      <guid>https://forem.com/samhartley_dev/5-telegram-bot-ideas-that-actually-make-money-built-a-few-of-these-myself-46eh</guid>
      <description>&lt;p&gt;Everyone and their grandmother builds Telegram bots these days. But most of them are glorified toys that never earn a cent.&lt;/p&gt;

&lt;p&gt;I've been building and selling custom Telegram bots as a side income for the past year. Here are 5 ideas that have &lt;strong&gt;real revenue models behind them&lt;/strong&gt; — not wishful thinking, not "build an audience first" advice.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Crypto Price Alert Bot ($5–20/month per user)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Monitors crypto prices in real-time and sends instant Telegram messages when assets hit thresholds the user defines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it actually works:&lt;/strong&gt;&lt;br&gt;
Traders check prices constantly. An alert bot eliminates the anxiety loop of "let me just check one more time." The free tier (3 alerts) hooks them. The paid tier (unlimited alerts + portfolio tracking) converts the ones who actually trade.&lt;/p&gt;

&lt;p&gt;API costs for this? Basically zero — free tiers of CoinGecko or Twelve Data cover thousands of requests per day.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd charge:&lt;/strong&gt; $5/month basic, $15/month with portfolio tracking. With 50 paid users that's $750/month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tech:&lt;/strong&gt; Python + python-telegram-bot + any crypto API. Runs on a cheap VPS or a home server.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. AI Customer Support Bot ($50–500/month per client)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Handles customer questions using an LLM trained on the client's FAQ and documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it actually works:&lt;/strong&gt;&lt;br&gt;
Small businesses pay $15–25/hour for human support. Their support reps answer the same 20 questions every day. A bot handles 80% of that. The business keeps humans only for the edge cases.&lt;/p&gt;

&lt;p&gt;The pitch practically writes itself: &lt;em&gt;"Your customers get instant answers at 3 AM. You save $2,000/month in support costs."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd charge:&lt;/strong&gt; $300–500 setup + $100–200/month retainer. One client pays for a month of work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key detail:&lt;/strong&gt; Running this on Ollama locally eliminates API costs entirely. Qwen 3.5 9B handles most customer service scenarios well.&lt;/p&gt;


&lt;h2&gt;
  
  
  3. Content Scheduler Bot ($10–30/month per user)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Channel owners send content to the bot, set a schedule, and it auto-posts at the right time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it actually works:&lt;/strong&gt;&lt;br&gt;
Here's a gap I noticed: Telegram scheduling tools are surprisingly weak. Buffer and Hootsuite barely support Telegram. Channel operators with 10K+ subscribers are either posting manually at odd hours or using jank workarounds.&lt;/p&gt;

&lt;p&gt;Built-in analytics (which post got the most views) would be a huge differentiator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd charge:&lt;/strong&gt; 1 channel free, 5 channels $10/month, unlimited $30/month. The free tier does the marketing for you — channel owners recommend it to other channel owners.&lt;/p&gt;


&lt;h2&gt;
  
  
  4. Group Moderation Bot ($100–300 one-time)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Anti-spam, welcome messages, auto-moderation, member verification, analytics for large Telegram groups.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it actually works:&lt;/strong&gt;&lt;br&gt;
Bots like Combot and Rose exist, but they're generic. Large communities — gaming clans, trading groups, local neighborhood chats — have specific rules and specific problems. A custom bot fits their exact workflow.&lt;/p&gt;

&lt;p&gt;You're not competing with Combot. You're selling "exactly what your community needs."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd charge:&lt;/strong&gt; $100–300 one-time, plus $10–20/month optional hosting. The hosting is passive income that compounds.&lt;/p&gt;


&lt;h2&gt;
  
  
  5. Notification Bridge Bot ($200–500 one-time)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Bridges alerts from external systems (servers, IoT sensors, security cameras, any API) into Telegram.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it actually works:&lt;/strong&gt;&lt;br&gt;
DevOps engineers want server alerts in Telegram, not email. Smart home tinkerers want camera motion alerts on their phone. Small businesses want order notifications in their team chat.&lt;/p&gt;

&lt;p&gt;This is pure infrastructure work — not glamorous, but people pay well for it because it saves them hours of setup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Server monitoring (Prometheus)
  → Webhook fires on alert
  → Bot sends formatted message to Telegram group
  → Team sees "🔴 Server CPU at 95%" instantly
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What I'd charge:&lt;/strong&gt; $200–500 per project depending on complexity. These are usually one-day builds.&lt;/p&gt;




&lt;h2&gt;
  
  
  What All of These Have in Common
&lt;/h2&gt;

&lt;p&gt;Looking back at what actually converts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;They solve annoyances people already have&lt;/strong&gt; — not problems you invented&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low infrastructure cost&lt;/strong&gt; — most run on a $5 VPS or a home server with Ollama&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clear ROI for the buyer&lt;/strong&gt; — "you save X dollars" or "saves you Y hours" is an easy sell&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Telegram's built-in payments&lt;/strong&gt; make subscriptions surprisingly easy to implement&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The biggest mistake I see: people build a bot first, then look for users. Do it backwards. Find someone who already has the problem, ask what they'd pay, &lt;em&gt;then&lt;/em&gt; build it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Start
&lt;/h2&gt;

&lt;p&gt;Pick the one that matches a community you're already in. If you're in crypto Twitter, start with the alert bot — you'll have early users from day one. If you do DevOps, the notification bridge is a weekend project that pays for itself immediately.&lt;/p&gt;

&lt;p&gt;The MVP doesn't need to be fancy. Mine were usually 200 lines of Python and a weekend of work.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What bots have you built that actually made money? Curious to hear what's worked for others.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>telegram</category>
      <category>bots</category>
      <category>sideprojects</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Built a Custom Garmin Watch Face With Live Stock Sparklines — Here's What I Learned</title>
      <dc:creator>Sam Hartley</dc:creator>
      <pubDate>Mon, 09 Mar 2026 08:01:24 +0000</pubDate>
      <link>https://forem.com/samhartley_dev/i-built-a-custom-garmin-watch-face-with-live-stock-sparklines-heres-what-i-learned-129n</link>
      <guid>https://forem.com/samhartley_dev/i-built-a-custom-garmin-watch-face-with-live-stock-sparklines-heres-what-i-learned-129n</guid>
      <description>&lt;p&gt;Last spring I got annoyed enough at constantly pulling out my phone to check stock prices that I decided to just... put them on my watch.&lt;/p&gt;

&lt;p&gt;Not as notifications. Not as a widget. A proper watch face with live sparkline charts for 5 configurable assets — crypto, stocks, commodities — updating every 15 minutes automatically.&lt;/p&gt;

&lt;p&gt;Here's what building &lt;strong&gt;StockFaceTC&lt;/strong&gt; for the Garmin Venu 2 Plus actually looked like.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Language Nobody Talks About: Monkey C
&lt;/h2&gt;

&lt;p&gt;First thing you'll notice: there's basically no community around Garmin development compared to iOS or Android. Stack Overflow has maybe 200 questions. The official forums are a ghost town. The documentation has dead links.&lt;/p&gt;

&lt;p&gt;The language itself — Monkey C — is a mix of Java and JavaScript with Garmin-specific APIs bolted on. It's fine. The runtime constraints are the real challenge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;124KB total memory&lt;/strong&gt; for the watch face&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;64KB separate sandbox&lt;/strong&gt; for background services&lt;/li&gt;
&lt;li&gt;No npm, no frameworks, no familiar tooling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The simulator is decent but has one critical limitation: it can't make real web requests. You test your API calls on real hardware connected to a phone via Bluetooth, or you don't test them at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Font Problem That Cost Me a Week
&lt;/h2&gt;

&lt;p&gt;I wanted custom text for the sparkline labels. Garmin has a &lt;code&gt;VectorFont&lt;/code&gt; API for this.&lt;/p&gt;

&lt;p&gt;The Venu 2 Plus doesn't support it.&lt;/p&gt;

&lt;p&gt;Neither do most Garmin devices, it turns out.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;PrimitiveFont&lt;/strong&gt; — a complete vector font rendered entirely with &lt;code&gt;drawLine()&lt;/code&gt; calls. Every character is defined on a 5×7 grid:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// "A" on a 5×7 grid&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CHAR_A&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multiply by scale factor (&lt;code&gt;sizePx / 7.0&lt;/code&gt;) and you have arbitrary text at any size.&lt;/p&gt;

&lt;p&gt;I went further and added proportional widths (so "i" is narrow, "M" is wide), bold, italic, and — the fun one — &lt;strong&gt;arc text&lt;/strong&gt;. Characters curved along a circular path for the ring labels around the chart.&lt;/p&gt;

&lt;p&gt;The whole library is ~21KB. Fits comfortably.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 64KB Background Sandwich
&lt;/h2&gt;

&lt;p&gt;Here's the architecture problem: Garmin watch faces have a &lt;strong&gt;background service&lt;/strong&gt; that runs separately from the main app, in its own 64KB memory sandbox. It can't share memory with the main face directly.&lt;/p&gt;

&lt;p&gt;The only data transfer mechanism: &lt;code&gt;Background.exit(dictionary)&lt;/code&gt; — pass a dictionary out, receive it in &lt;code&gt;onBackgroundData()&lt;/code&gt; in the main app.&lt;/p&gt;

&lt;p&gt;This means you can't just parse a full API response and hand it over. You have to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fetch the data&lt;/li&gt;
&lt;li&gt;Extract only what you need (closing prices for 24 candles × 5 symbols)&lt;/li&gt;
&lt;li&gt;Pack it into the tightest dictionary you can&lt;/li&gt;
&lt;li&gt;Exit&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;My fetch loop handles 5 symbols sequentially, parses the JSON inline, and builds a compact model before exiting. The background service uses maybe 40KB peak. Close enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Labels Right (Not by Hardcoding Them)
&lt;/h2&gt;

&lt;p&gt;Early version I hardcoded display names: &lt;code&gt;"XAU/USD" → "GOLD"&lt;/code&gt;. Obviously terrible.&lt;/p&gt;

&lt;p&gt;The Twelve Data API returns metadata with every response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"meta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"symbol"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BTC/USD"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"currency_base"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bitcoin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Digital Currency"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stocks&lt;/strong&gt; → &lt;code&gt;meta.symbol&lt;/code&gt; → "AAPL"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crypto&lt;/strong&gt; → &lt;code&gt;meta.currency_base&lt;/code&gt; → "Bitcoin" → "BITCOIN" (truncated to fit)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commodities&lt;/strong&gt; → &lt;code&gt;meta.currency_base&lt;/code&gt; → "Gold Spot" → "GOLD"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Zero hardcoded aliases. Add any valid ticker, get the right label. Works for everything in their catalog.&lt;/p&gt;

&lt;h2&gt;
  
  
  Budget Arithmetic
&lt;/h2&gt;

&lt;p&gt;Twelve Data's free tier gives you 800 API calls/day.&lt;/p&gt;

&lt;p&gt;My setup: 5 symbols × 4 fetches/hour × 24 hours = &lt;strong&gt;480 calls/day&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Fits with 320 to spare. No paid tier needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotchas I Wish Someone Had Documented
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Properties are cached forever.&lt;/strong&gt; Changing defaults in &lt;code&gt;properties.xml&lt;/code&gt; after the first install does nothing. You have to "Reset All App Data" in the simulator menu. Spent an embarrassing amount of time on this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;makeWebRequest()&lt;/code&gt; is stubbed in the simulator.&lt;/strong&gt; It runs without errors, triggers the callback with null data, and you wonder why your code is broken. It's not — you just need a real device.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;getSettingsView()&lt;/code&gt; must be implemented.&lt;/strong&gt; Even if you return &lt;code&gt;null&lt;/code&gt;. Otherwise the settings option in Connect app stays greyed out and you'll waste 30 minutes thinking it's a permissions issue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The simulator's memory limits aren't accurate.&lt;/strong&gt; It'll run code that crashes on device. Always test on hardware before you call it done.&lt;/p&gt;

&lt;h2&gt;
  
  
  The End Result
&lt;/h2&gt;

&lt;p&gt;The watch face has been on my wrist every day for about 4 months now. The sparklines are genuinely useful — you can see at a glance whether BTC had a rough night or AAPL is doing something interesting.&lt;/p&gt;

&lt;p&gt;The hardest part wasn't the code. It was the documentation gaps and the "you'll just have to test on device" reality of Connect IQ development.&lt;/p&gt;

&lt;p&gt;But it runs. Updates every 15 minutes. Uses a free API tier. And cost exactly $0 beyond the hardware I already owned.&lt;/p&gt;




&lt;p&gt;If you're thinking about building a Garmin app: the platform is rough around the edges, but it's real code running on real hardware on your wrist. That's genuinely satisfying in a way that web apps aren't.&lt;/p&gt;

&lt;p&gt;The source is on GitLab if you want to poke around. Happy to answer questions in the comments — there aren't many of us in the Garmin dev community, might as well help each other.&lt;/p&gt;

&lt;h1&gt;
  
  
  garmin #connectiq #monkeyc #wearables #devlog
&lt;/h1&gt;

</description>
      <category>garmin</category>
      <category>connectiq</category>
      <category>monkeyc</category>
      <category>wearables</category>
    </item>
    <item>
      <title>I Ditched OpenAI and Run AI Locally for Free — Here's How</title>
      <dc:creator>Sam Hartley</dc:creator>
      <pubDate>Sat, 07 Mar 2026 12:33:19 +0000</pubDate>
      <link>https://forem.com/samhartley_dev/run-your-own-ai-server-for-0month-with-ollama-3e12</link>
      <guid>https://forem.com/samhartley_dev/run-your-own-ai-server-for-0month-with-ollama-3e12</guid>
      <description>&lt;h1&gt;
  
  
  I Ditched OpenAI and Run AI Locally for Free — Here's How
&lt;/h1&gt;

&lt;p&gt;I was spending ~$80/month on API calls. ChatGPT Plus, some Anthropic credits, the occasional Gemini Pro request. It adds up fast when you're prototyping things.&lt;/p&gt;

&lt;p&gt;Then I discovered you can run surprisingly good models on hardware you probably already own. I've been running a fully local AI setup for about a month now, and my API bill went to zero.&lt;/p&gt;

&lt;p&gt;Here's the exact setup I'm using.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hardware (Nothing Fancy)
&lt;/h2&gt;

&lt;p&gt;My main inference machine is a desktop PC with an RTX 3060 (12GB VRAM). You can find these used for ~$150. That's it. No A100, no cloud GPU rental.&lt;/p&gt;

&lt;p&gt;For context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;8B parameter models&lt;/strong&gt; (like Qwen 3.5) run at ~40 tokens/sec on this card&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;30B parameter models&lt;/strong&gt; (like Qwen 3 Coder) run at a comfortable ~12 tokens/sec&lt;/li&gt;
&lt;li&gt;Even on a MacBook M1 with 16GB RAM, 8B models are perfectly usable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have any modern GPU with 8GB+ VRAM, or an Apple Silicon Mac, you're good.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Install Ollama
&lt;/h2&gt;

&lt;p&gt;This is the part that surprised me. No Docker, no conda environments, no dependency nightmares:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On macOS, you can also &lt;code&gt;brew install ollama&lt;/code&gt;. Windows has an installer. That's literally it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Pull a Model
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Good general-purpose model&lt;/span&gt;
ollama pull qwen3.5:9b

&lt;span class="c"&gt;# Great for code&lt;/span&gt;
ollama pull qwen3-coder:30b

&lt;span class="c"&gt;# Solid reasoning&lt;/span&gt;
ollama pull deepseek-r1:8b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Models download once (~5-18GB depending on size) and run forever. No recurring costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Use It
&lt;/h2&gt;

&lt;p&gt;Interactive:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run qwen3.5:9b
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; Explain the difference between async and parallel execution
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or via API — and here's the killer feature: &lt;strong&gt;it's OpenAI-compatible&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:11434/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "qwen3.5:9b",
    "messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any tool built for the OpenAI API works by just changing the base URL. I've connected it to VS Code extensions, custom scripts, even a Telegram bot.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Use This For
&lt;/h2&gt;

&lt;p&gt;This isn't theoretical. Here's what I run daily:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Code reviews&lt;/strong&gt; — I pipe diffs through the 30B coder model. It catches things I miss, especially in languages I'm less familiar with&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Telegram bot&lt;/strong&gt; — Runs 24/7 on a Mac Mini, answers questions using Qwen 3.5. Nobody can tell it's local&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document Q&amp;amp;A&lt;/strong&gt; — RAG pipeline with local embeddings. Load PDFs, ask questions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quick lookups&lt;/strong&gt; — Instead of context-switching to ChatGPT, I just &lt;code&gt;ollama run&lt;/code&gt; in the terminal&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Real Cost Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;th&gt;Privacy&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI GPT-4o&lt;/td&gt;
&lt;td&gt;$20-200+&lt;/td&gt;
&lt;td&gt;Data leaves your machine&lt;/td&gt;
&lt;td&gt;~1-3s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local Ollama&lt;/td&gt;
&lt;td&gt;~$5 electricity&lt;/td&gt;
&lt;td&gt;Everything stays local&lt;/td&gt;
&lt;td&gt;~0.5-2s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The electricity cost is real but negligible. My PC draws about 250W under GPU load, and I'm not running inference 24/7.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things I Wish I Knew Earlier
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;GPU matters way more than CPU.&lt;/strong&gt; A $150 used RTX 3060 is 10-15x faster than even a high-end CPU for inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Start with smaller models.&lt;/strong&gt; 7-9B models are shockingly capable. Don't jump to 70B thinking bigger = better — the speed tradeoff isn't worth it for most tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Different models for different jobs.&lt;/strong&gt; I use the coder model for code, the reasoning model for analysis, and the general model for chat. Specialization matters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make it a network service.&lt;/strong&gt; Set &lt;code&gt;OLLAMA_HOST=0.0.0.0&lt;/code&gt; and every device in your house can use it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It works offline.&lt;/strong&gt; Plane, cabin, whatever. No internet needed after the initial model download&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Is It As Good As GPT-4?
&lt;/h2&gt;

&lt;p&gt;Honestly? For 80% of what I was using GPT-4 for, yes. The 9B models handle everyday coding questions, text generation, and analysis just fine. &lt;/p&gt;

&lt;p&gt;For the really hard stuff — complex multi-step reasoning, very long context — the cloud models still have an edge. But I find myself needing that maybe once a week. Not worth $80/month.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;If you have a gaming PC or a recent Mac, you can be up and running in literally 5 minutes. &lt;code&gt;curl | sh&lt;/code&gt;, &lt;code&gt;ollama pull&lt;/code&gt;, &lt;code&gt;ollama run&lt;/code&gt;. That's the whole setup.&lt;/p&gt;

&lt;p&gt;The worst that happens is you wasted 10 minutes. The best? You save thousands of dollars a year and keep your data private.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What models are you running locally? I'm always looking for recommendations — drop them in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>ollama</category>
      <category>selfhosted</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
