<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Dhaivat Jambudia</title>
    <description>The latest articles on Forem by Dhaivat Jambudia (@dhaivat_jambudia).</description>
    <link>https://forem.com/dhaivat_jambudia</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3857020%2Fb134fbe2-5fa0-404d-a53d-1005d73a0d59.png</url>
      <title>Forem: Dhaivat Jambudia</title>
      <link>https://forem.com/dhaivat_jambudia</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/dhaivat_jambudia"/>
    <language>en</language>
    <item>
      <title>I Replaced a $200/mo AI Stack with OpenClaw + Free Models. Here's the Exact Setup (and Why Security Almost Killed It)</title>
      <dc:creator>Dhaivat Jambudia</dc:creator>
      <pubDate>Fri, 03 Apr 2026 14:10:22 +0000</pubDate>
      <link>https://forem.com/dhaivat_jambudia/i-replaced-a-200mo-ai-stack-with-openclaw-free-models-heres-the-exact-setup-and-why-security-24d1</link>
      <guid>https://forem.com/dhaivat_jambudia/i-replaced-a-200mo-ai-stack-with-openclaw-free-models-heres-the-exact-setup-and-why-security-24d1</guid>
      <description>&lt;p&gt;I spent 2 weeks building a setup with OpenClaw, free/cheap models for the boring stuff, and only routing the hard problems to expensive models. The result? Our AI bill went from ~$200/mo to around $10/mo. And honestly, the quality for 90% of tasks is the same.&lt;/p&gt;

&lt;p&gt;But this is the part few people talks about on twitter the security side of OpenClaw nearly burnt us. I'll get into that too because if you're a founder or CTO thinking about deploying this, you need to hear the ugly parts.&lt;/p&gt;

&lt;p&gt;Lets get into it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is OpenClaw, and Why Should You Care
&lt;/h2&gt;

&lt;p&gt;If you haven't heard of OpenClaw yet... where have you been? Its the fastest growing open source project in GitHub history. Over 163k stars. The project started as a personal AI assistant by Peter Steinberger (yes, the iOS dev legend) and exploded because it does something no other tool does well — it gives you a persistent AI agent that lives in your messaging apps and actually does things on your behalf.&lt;/p&gt;

&lt;p&gt;WhatsApp, Telegram, Slack, Discord, email — OpenClaw connects to all of them through a single gateway. Its not a chatbot. Its an agent that can run shell commands, browse the web, manage your calendar, read and write files, and more. Think of it like having a junior employee that never sleeps, never complains, and costs pennies per hour.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Quick Context
OpenClaw is model-agnostic. You can plug in Claude, GPT, Gemini, or run completely free local models through Ollama. This is the key that makes the cost optimization possible.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For founders and CTOs, heres why it matters: you can automate 80% of your operational busywork without building custom software. No Python scripts, no Zapier chains, no hiring a developer to glue APIs together. You write a SOUL.md config file, connect your channels, and you're live.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Idea: Not Every Task Needs a $75/M-token Model
&lt;/h2&gt;

&lt;p&gt;This is the thing that took me embarrassingly long to realize. When someone emails us saying "hey can you send me the latest invoice?", your AI doesn't need Claude Opus to understand that. A 7B parameter model can handle it. When someone fills a form and you need to parse it into structured data — again, small model territory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;openclaw.json&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;config&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;that&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;saved&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;us&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;$$$/mo&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama/qwen3:32b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"thinking"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic/claude-sonnet-4-20250514"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"fallbacks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"openai/gpt-4.1-mini"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"google/gemini-2.5-flash"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What We Actually Automate (With Examples)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Email Reply Drafts — Free Model
&lt;/h3&gt;

&lt;p&gt;Our support inbox gets maybe 60-80 emails a day. Most of them are variants of the same 15 questions. We wrote a skill that reads incoming emails, matches them against our FAQ knowledge base, and drafts a reply. The human just reviews and hits send.&lt;/p&gt;

&lt;p&gt;Model used: Qwen3 32B (local, $0). For templated email replies, this model is more then adequate. It follows instructions well, keeps the tone professional, and doesn't hallucinate company policies because we feed it the exact docs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# skills/email-drafter/SKILL.md&lt;/span&gt;
name: email-reply-drafter
trigger: new email in support inbox
model_override: ollama/qwen3:32b

&lt;span class="gh"&gt;# Steps:&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Read incoming email
&lt;span class="p"&gt;2.&lt;/span&gt; Search FAQ knowledge base for matching topics
&lt;span class="p"&gt;3.&lt;/span&gt; Draft reply using company voice guidelines
&lt;span class="p"&gt;4.&lt;/span&gt; Send draft to #email-review Slack channel
&lt;span class="p"&gt;5.&lt;/span&gt; Wait for human approval before sending
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Form Processing — Free Model
&lt;/h3&gt;

&lt;p&gt;We have clients who fill out onboarding forms. The data comes in messy — sometimes PDF, sometimes Google Form, sometimes literally a photo of a handwritten form (yes, in 2026). The OpenClaw skill extracts the data, structures it, and pushes it to our CRM.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Code Reviews &amp;amp; Bug Fixes — Expensive Model (Sub-Agent)
&lt;/h3&gt;

&lt;p&gt;This is where it gets interesting. When our agent encounters a coding task — someone reports a bug, or we need to generate a script — it spawns a sub-agent that uses Claude Sonnet 4 or routes to Claude Code.&lt;/p&gt;

&lt;p&gt;Why not use the free model here? Because I tried, and the results were... lets just say, not production-ready. Qwen 32B can write a for loop fine. But ask it to debug a race condition in an async Node.js service and it falls apart. The bigger models just get it in ways that smaller models don't.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Sub-agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;config&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;coding&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tasks&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"skills"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"code-reviewer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model_override"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic/claude-sonnet-4-20250514"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"shell"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"browser"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"file-edit"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"sandbox"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also pipe coding tasks directly to Claude Code or OpenAI Codex as external tools. OpenClaw's tool system lets you call any CLI tool, so you literally just wrap claude-code or codex as a skill and the agent delegates to them when needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Now the important part: Security
&lt;/h2&gt;

&lt;p&gt;OpenClaw's security track record is... not great. And I say this as someone who genuinely loves the project. The speed of adoption outpaced the security hardening by a huge margin. which is user's mistake not openclaw's.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Actually Did to Lock It Down
&lt;/h2&gt;

&lt;p&gt;After reading the Cisco blog and the Microsoft post, I spent a full weekend hardening our setup. Heres the non-negotiable stuff:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Docker isolation.&lt;/strong&gt; Our OpenClaw instance runs in a container with no access to the host filesystem except for a single mounted volume for the workspace. The agent physically cannot touch anything outside its sandbox.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.Dedicated credentials.&lt;/strong&gt; The agent has its own email account, its own API keys, its own everything. If it gets compromised, the blast radius is limited. Its not sharing my personal Google account or our company's AWS root credentials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3.No third-party skills.&lt;/strong&gt; Zero. We write all our skills in-house. I don't care how cool a ClawHub skill looks — its not going on our machine until the ecosystem has proper code review and signing. Maybe in 6 months.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4.Network segmentation.&lt;/strong&gt; The container runs on an isolated network. It can reach the model APIs and our internal services, but nothing else. No random outbound connections to god-knows-where.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5.Human-in-the-loop for destructive actions.&lt;/strong&gt; Any action that sends an email, modifies a file, or runs a command requires approval in Slack. The agent proposes, the human approves. No autonomous destructive operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line: Is It Worth It?
&lt;/h2&gt;

&lt;p&gt;Absolutely Yes, You need technical chops. This is not a "click three buttons and you're done" setup. You need to understand Docker, networking, API keys, model capabilities, and security basics. If your team doesn't have someone who can set this up and maintain it, pay for a managed solution instead.&lt;/p&gt;

&lt;p&gt;Want help setting this up for your team?&lt;br&gt;
I've helped people deploy this exact architecture over the past month. If your burning money on AI API bills and want to cut costs without losing quality, reach out. I do a free 30-minute audit call where we look at your current setup and identify where free models can replace expensive ones.&lt;/p&gt;

&lt;p&gt;DM me on &lt;a href="https://x.com/dhaivat00" rel="noopener noreferrer"&gt;Twitter/X&lt;/a&gt; or on &lt;a href="https://www.linkedin.com/in/dhaivat-jambudia/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>security</category>
    </item>
    <item>
      <title>I Built a RAG System to Chat With Newton's Entire Wikipedia</title>
      <dc:creator>Dhaivat Jambudia</dc:creator>
      <pubDate>Thu, 02 Apr 2026 16:29:02 +0000</pubDate>
      <link>https://forem.com/dhaivat_jambudia/i-built-a-rag-system-to-chat-with-newtons-entire-wikipedia-1ndg</link>
      <guid>https://forem.com/dhaivat_jambudia/i-built-a-rag-system-to-chat-with-newtons-entire-wikipedia-1ndg</guid>
      <description>&lt;p&gt;Most RAG tutorials just say "chunk your PDF and call OpenAI". I wanted to build something more real — a proper pipeline that actually ingests, cleans, embeds, and serves knowledge from Isaac Newton's Wikipedia page end to end.  &lt;/p&gt;

&lt;p&gt;The result is Newton LLM. You can now ask things like "What are Newton's contributions in Calculus?" and get proper answers with sources instead of made-up stuff.&lt;br&gt;
Here's how I actually built it and what I learned.&lt;br&gt;
The Problem With Most RAG Demos&lt;br&gt;
Every YouTube RAG tutorial follows the same boring steps: load PDF, split into chunks, put in vector store, done.&lt;br&gt;
But nobody talks about the real issues:&lt;/p&gt;

&lt;p&gt;How do you keep the data fresh when the source changes?&lt;br&gt;
How do you clean messy web data before embedding?&lt;br&gt;
How do you separate the ingestion part from the serving part?&lt;br&gt;
How do you make the whole thing actually deployable?&lt;/p&gt;

&lt;p&gt;Newton LLM tries to solve these. Its not just a notebook — its a small system.&lt;br&gt;
Architecture Overview&lt;br&gt;
The system has two main layers:&lt;/p&gt;

&lt;p&gt;Data Ingestion Layer (the offline part)&lt;/p&gt;

&lt;p&gt;Source → Airflow → MongoDB&lt;br&gt;
I pull data from Wikipedia about Newton — his life, physics, math, optics etc.&lt;br&gt;
Apache Airflow runs the whole ETL pipeline through a DAG. It fetches, cleans, and transforms the raw content. No random scripts or cron jobs. Airflow handles retries, scheduling and monitoring.&lt;br&gt;
MongoDB stores the cleaned documents. This is my "source of truth" before anything gets embedded.&lt;br&gt;
Why not embed straight from Wikipedia? Because raw scraped pages are full of garbage — menus, references, bad HTML. You need to clean it first. MongoDB gives me a clean staging area.&lt;/p&gt;

&lt;p&gt;RAG Serving Layer (the online part)&lt;/p&gt;

&lt;p&gt;Qdrant ← Batch Embeddings ← MongoDB&lt;br&gt;
Since Newton's Wikipedia doesn't change every day, I use batch embedding instead of doing it live. Documents go from MongoDB → embedding model → Qdrant in scheduled batches. Its cheaper and faster.&lt;br&gt;
When user asks a question:&lt;br&gt;
User Question&lt;br&gt;
→ FastAPI gets it&lt;br&gt;
→ Query gets embedded&lt;br&gt;
→ Qdrant finds similar chunks&lt;br&gt;
→ Retrieved docs + question → LLM&lt;br&gt;
→ Answer with sources&lt;br&gt;
The LLM always gets context. It helps a lot with hallucinations.&lt;br&gt;
Tech Stack&lt;/p&gt;

&lt;p&gt;Orchestration: Apache Airflow (for DAGs, retries, monitoring)&lt;br&gt;
Document Store: MongoDB (flexible for messy Wikipedia data)&lt;br&gt;
Vector Store: Qdrant (fast and open source)&lt;br&gt;
Backend: FastAPI (quick and clean)&lt;br&gt;
Frontend: Next.js / Streamlit (Next for real use, Streamlit for quick tests)&lt;/p&gt;

&lt;p&gt;Key Decisions&lt;br&gt;
Batch Embedding &amp;gt; Real-time Embedding&lt;br&gt;
Most tutorials embed on the fly. For static data like this, its stupid to keep re-embedding the same things. I run batch embedding once or on schedule and save a lot of time and money.&lt;br&gt;
Airflow instead of simple Python script&lt;br&gt;
I could have just written one scrape_and_embed.py file. But Airflow gives retries, proper logging, scheduling and makes each step separate. If Wikipedia is down, it retries automatically. For anything bigger than a toy project, orchestration actually matters.&lt;br&gt;
Separating Ingestion from Serving&lt;br&gt;
The scraping/cleaning part and the answering part are completely separate. Ingestion can break or update without touching the live RAG system. The serving layer just reads from Qdrant.&lt;br&gt;
What I'd Do Differently Next Time&lt;/p&gt;

&lt;p&gt;Add a reranker — simple vector search isn't enough. A reranker would make results much better.&lt;br&gt;
Build evaluation from the start — without proper eval, you don't know if your changes actually help.&lt;br&gt;
Add more sources — right now only Wikipedia. Academic papers would make it way stronger.&lt;br&gt;
Try hybrid search — combine vector search with keyword search (BM25).&lt;/p&gt;

&lt;p&gt;Final Thoughts&lt;br&gt;
Building a simple RAG demo is easy. Building something that actually works properly is much harder. Most of the work is in the boring parts: cleaning data, setting up orchestration, separating concerns, and deciding when to use batch vs real-time.&lt;br&gt;
Newton LLM showed me that good retrieval matters more than which LLM you use. If your pipeline is solid, even a smaller model gives good answers.&lt;br&gt;
If you're building RAG, focus on the data pipeline first, not fancy prompts.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiecjz1nego547ylj2qxu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiecjz1nego547ylj2qxu.png" alt=" " width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>python</category>
    </item>
  </channel>
</rss>
