<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Pranay ravi</title>
    <description>The latest articles on Forem by Pranay ravi (@pranay_ravi_b88172eac205c).</description>
    <link>https://forem.com/pranay_ravi_b88172eac205c</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3372066%2F5ac87370-81a3-4df2-a2b2-6e46f3c4ede8.jpg</url>
      <title>Forem: Pranay ravi</title>
      <link>https://forem.com/pranay_ravi_b88172eac205c</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/pranay_ravi_b88172eac205c"/>
    <language>en</language>
    <item>
      <title>Can AI Agents Replace Enterprise Workflow Orchestration? A Real-World Test — OpenClaw. n8n. Claude Dispatch. A side-by-side comparison..</title>
      <dc:creator>Pranay ravi</dc:creator>
      <pubDate>Sun, 17 May 2026 21:58:49 +0000</pubDate>
      <link>https://forem.com/pranay_ravi_b88172eac205c/can-ai-agents-replace-enterprise-workflow-orchestration-a-real-world-test-openclaw-n8n-claude-1hho</link>
      <guid>https://forem.com/pranay_ravi_b88172eac205c/can-ai-agents-replace-enterprise-workflow-orchestration-a-real-world-test-openclaw-n8n-claude-1hho</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg55iazexdoak4m1xq3ch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg55iazexdoak4m1xq3ch.png" alt=" " width="800" height="566"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;"A database administrator's honest investigation into whether the new wave of AI automation tools can handle enterprise-grade workflows — or whether the boring answer is still the right one."&lt;br&gt;
tags: n8n, automation, database, devops&lt;/p&gt;




&lt;p&gt;&lt;em&gt;A database administrator's honest investigation into whether the new wave of AI automation tools can handle enterprise-grade workflows — or whether the boring answer is still the right one.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fal7p5ugc1ic5gg1nmhza.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fal7p5ugc1ic5gg1nmhza.png" alt=" " width="800" height="1201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The workflow that started the question — and ended up answering it&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Everyone was talking about these tools. I got curious.
&lt;/h2&gt;

&lt;p&gt;I work as a database administrator. I support hundreds of databases across multiple environments — dev, staging, non-prod, production — in a HIPAA-regulated organization. Access management is a constant, grinding operational burden. Developers need access. Analysts need access. Application owners need access. And every single request needs to be approved, documented, and auditable.&lt;/p&gt;

&lt;p&gt;For a long time, the process looked like this: someone would message the DBA, the DBA would manually create a Jira ticket and a Word document, chase down approvals from a manager, a database manager, and the security team, manually create the account, and email the credentials back. Days could pass. Follow-ups stacked up. The security team had to trust that the paperwork was right.&lt;/p&gt;

&lt;p&gt;Then I started hearing about Claude Dispatch. And OpenClaw. Both were being described as AI tools that could receive a message and take action — automate tasks, call APIs, connect to services. The demos looked impressive. The communities were excited.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;And I thought: wait. Could one of these actually solve the problem I have been living with for years?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I had already built something with n8n — a workflow automation tool. But I genuinely wanted to know whether I had picked the right tool, or whether something newer and smarter had passed it by. So I did what any engineer would do: I used my actual problem as the test.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem I needed to solve
&lt;/h2&gt;

&lt;p&gt;Before comparing any tools, let me describe the workflow precisely, because the details are what make the comparison meaningful.&lt;/p&gt;

&lt;p&gt;A developer — or a data analyst, or a product manager — needs access to a specific database. The request needs to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to their direct manager for approval&lt;/li&gt;
&lt;li&gt;Then to the database manager for approval&lt;/li&gt;
&lt;li&gt;Then to the security team for final approval&lt;/li&gt;
&lt;li&gt;Create a full audit trail at every step&lt;/li&gt;
&lt;li&gt;Call a pre-existing API to provision the account with the correct access level&lt;/li&gt;
&lt;li&gt;Deliver the credentials back to the requester&lt;/li&gt;
&lt;li&gt;Spin up a Jira ticket for the network team to open the firewall port&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every approval is sequential — no step fires unless the previous one passes. If anyone says no, the chain stops and the requester is notified. If security does not approve, no account gets created. Full stop.&lt;/p&gt;

&lt;p&gt;This is not a hypothetical. It runs in a regulated environment where access to production data is governed by HIPAA. The audit trail is not nice-to-have. It is required.&lt;/p&gt;

&lt;p&gt;The access levels themselves are structured — not free text. Read only. Read/write. Dev owner. Application owner. DBA. Each maps to a specific API endpoint. Each produces a deterministic result. The central database inventory that drives all of this is a relational database populated automatically when infrastructure is created through Terraform. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nobody should be able to bypass it.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqthftls95b7kxn8o4edc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqthftls95b7kxn8o4edc.png" alt=" " width="800" height="1201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1: The complete approval chain — from Webex message to provisioned credentials and dual Jira audit trail&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  First candidate: Claude Dispatch
&lt;/h2&gt;

&lt;p&gt;Claude Dispatch is Anthropic's answer to the question: what if you could delegate tasks to your AI from your phone and come back to find them done? It lives inside Claude Cowork — a desktop agent product — and creates a persistent connection between your mobile app and the Claude Desktop app running on your computer.&lt;/p&gt;

&lt;p&gt;The pitch is genuinely compelling. Send a message from your phone, Claude acts on your desktop: reads files, calls APIs, summarizes documents, delivers results. For personal productivity this is interesting. For ad-hoc delegation it is actually useful.&lt;/p&gt;

&lt;p&gt;So I asked the obvious question: could Dispatch receive a Webex message, run an approval chain, call my database API, and write to Jira?&lt;/p&gt;

&lt;p&gt;Here is where the investigation got honest quickly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dispatch requires the Claude Desktop app to be running on your computer. The moment the laptop sleeps, it stops.&lt;/li&gt;
&lt;li&gt;There is no server. There is no always-on process. There is no execution log.&lt;/li&gt;
&lt;li&gt;The workflow is driven by an LLM reasoning about what to do — not a deterministic set of rules. The same input could produce a different output on a different day.&lt;/li&gt;
&lt;li&gt;There is no concept of a sequential approval gate. Claude does not wait for a human to respond before deciding the next step.&lt;/li&gt;
&lt;li&gt;There is no audit trail. No timestamps. No record of who approved what and when.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;A workflow that depends on a laptop staying awake is not an enterprise workflow. It is a personal convenience. There is nothing wrong with that — but it is a different category of tool entirely.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Cost-wise, Dispatch is bundled with Claude Pro at $20 per month or Max at $100 per month. Accessible pricing. But the architecture disqualifies it for this use case before the price even matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  Second candidate: OpenClaw
&lt;/h2&gt;

&lt;p&gt;OpenClaw is a different kind of tool. It is open-source, self-hostable, and designed as a personal AI assistant that runs on your own infrastructure. You can connect it to Webex, Telegram, Slack, WhatsApp — it listens on those channels and uses an LLM to decide what action to take in response to a message.&lt;/p&gt;

&lt;p&gt;The self-hosted angle immediately made it more interesting for regulated environments. If you run it on a VPS rather than your laptop, it can operate 24/7. And because it is open-source under an MIT license, the software itself costs nothing. Your real costs are the VPS — roughly $5 to $15 per month — and the API tokens from whichever LLM provider you connect.&lt;/p&gt;

&lt;p&gt;So OpenClaw gets past the first disqualification that knocked out Dispatch. It can stay on. Good start.&lt;/p&gt;

&lt;p&gt;But then I pushed further:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw has no concept of a deterministic approval chain. It reasons about what to do. If I ask it to get manager approval before proceeding, it will try — but there is no guarantee it handles every edge case the same way every time.&lt;/li&gt;
&lt;li&gt;There is no built-in error handling or retry logic. If an API call fails, the agent may or may not handle it gracefully.&lt;/li&gt;
&lt;li&gt;There are no execution logs in any structured, auditable format. The LLM's reasoning is not a HIPAA audit trail.&lt;/li&gt;
&lt;li&gt;There is no native Jira integration. You can make API calls, but you are building that logic yourself in an environment with no visual workflow editor.&lt;/li&gt;
&lt;li&gt;Setup requires real DevOps experience — Docker, VPS configuration, model routing. Not a weekend project for someone who just needs automation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenClaw is genuinely impressive for what it is: a powerful, flexible personal AI assistant for technical users who want to automate their own workflows. It is not what I needed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The core tension is this: OpenClaw lets an AI decide what to do. In a regulated environment, I need a system that does exactly what it is configured to do — every single time.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The one I already had: n8n
&lt;/h2&gt;

&lt;p&gt;n8n is not the newest tool in this comparison. It is not the most talked-about. It does not have a viral GitHub repository or a growing community of people sharing AI agent demos. It is a workflow automation platform — visual, node-based, deterministic.&lt;/p&gt;

&lt;p&gt;I had already built the database access workflow in n8n before I started this investigation. What the investigation forced me to do was articulate why it works where the others do not.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;n8n runs on a server. Always on. No desktop dependency.&lt;/li&gt;
&lt;li&gt;Every workflow execution is logged — step by step, with timestamps. If something fails, you know exactly where and why.&lt;/li&gt;
&lt;li&gt;The approval chain is explicit: manager node fires, waits for a webhook response, branches on yes or no. Database manager node fires. Security node fires. No LLM is deciding the order. The order is the configuration.&lt;/li&gt;
&lt;li&gt;Jira, Webex, and REST API integrations are native — no custom code required to connect them.&lt;/li&gt;
&lt;li&gt;When security approves, a Jira ticket is created automatically with every approval documented: who approved, their role, the timestamp. A second ticket fires for the network team. The requester receives credentials via Webex.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The entire chain — from chat message to provisioned account — is deterministic, auditable, and server-side.&lt;/strong&gt; It does not matter whether my laptop is on. It does not matter whether the LLM is having a creative day. The workflow does what it is built to do.&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting them side by side
&lt;/h2&gt;

&lt;p&gt;Here is what the investigation produced — not as opinion, but as a structured comparison against the actual requirements of the problem.&lt;/p&gt;

&lt;p&gt;![Figure 2: Tool comparison — enterprise workflow criteria]&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36mziav4nlqsqilicocg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36mziav4nlqsqilicocg.png" alt=" " width="800" height="717"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 2: How the three tools compare across the criteria that actually matter for enterprise workflows&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The green cells are not a coincidence. n8n was built for exactly this category of problem — structured, multi-step, multi-system, always-on automation with governance requirements. Dispatch and OpenClaw were built for a different problem: intelligent, flexible, personal task delegation.&lt;/p&gt;

&lt;p&gt;Neither of those things is wrong. They are just different categories.&lt;/p&gt;




&lt;h2&gt;
  
  
  But is n8n actually accessible? Or is it just the enterprise answer no one wants to hear?
&lt;/h2&gt;

&lt;p&gt;This is the question I kept coming back to. Because n8n has a reputation for requiring technical knowledge. Webhooks, JSON payloads, API authentication, conditional branching. It is not a no-code tool in the purest sense.&lt;/p&gt;

&lt;p&gt;And yet — compared to OpenClaw, which requires DevOps expertise to host and configure, and compared to Dispatch, which requires you to trust an LLM to handle regulated processes — n8n is actually the most accessible path to a production-grade solution.&lt;/p&gt;

&lt;p&gt;The visual builder is genuinely good. The template library covers most common patterns. The community is large. And the self-hosted Community Edition is free — you pay only for the server, which can be as little as $4 a month.&lt;/p&gt;

&lt;p&gt;What n8n asks for is clarity of thought. You need to understand your process before you can automate it. That is not a technical barrier. That is just good engineering.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In 2008, I had my first exposure to BPM tools — business process management software. Back then, automating a multi-step approval workflow required consultants, enterprise licenses, and six-month implementation projects. n8n in 2026 is that same capability, accessible to a single engineer on a modest budget, in a weekend.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI-native tools will get there. The direction is right. But for workflows where consistency and auditability are non-negotiable, deterministic automation still wins. Not because AI is not impressive — it is. But because a HIPAA auditor does not accept "the AI usually does it right" as an answer.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the real world added that no tool comparison captures
&lt;/h2&gt;

&lt;p&gt;There is one thing I did not discover until the workflow was running: some users have managers on paper who are not actually the owners or decision-makers for the systems being requested.&lt;/p&gt;

&lt;p&gt;Organizational charts say one thing. Actual accountability sits somewhere else. When the approval request went to the wrong person, the workflow stalled — not because the automation failed, but because the data it depended on was wrong.&lt;/p&gt;

&lt;p&gt;No tool comparison surfaces this. You only find it when you run the thing on real people in a real organization.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is one of the most useful things a well-designed workflow can do: make your organizational data gaps visible. The automation did not hide the problem. It exposed it. And that forced the fix.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  So — can the new AI tools replace n8n for this?
&lt;/h2&gt;

&lt;p&gt;No. Not yet. Not for this class of problem.&lt;/p&gt;

&lt;p&gt;Claude Dispatch is a remote control for your desktop. It is well-designed and genuinely useful for personal delegation. But it has no server, no audit trail, and no deterministic logic — three things that are non-negotiable in a regulated environment.&lt;/p&gt;

&lt;p&gt;OpenClaw is a powerful personal AI assistant that technical users can self-host and extend. It can call APIs and respond to messages. But it has no structured approval chain, no execution logging, and no enterprise governance features.&lt;/p&gt;

&lt;p&gt;n8n is not the flashiest answer. It did not go viral. It does not use an LLM to decide what to do next. But it runs reliably, it logs everything, it integrates natively with Jira and Webex, and it does exactly what you configure it to do — every single time.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The right tool is not the newest tool. It is the tool that matches the shape of your problem. And some problems have a shape that requires determinism, not intelligence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The question I started with — can these AI tools solve what n8n solves — turned out to be the wrong question. The better question is: what kind of automation are you building? If the answer involves regulated data, sequential human approvals, and a legal requirement to prove who did what and when, the answer is still n8n. If the answer involves personal productivity, intelligent delegation, and flexible task handling, Dispatch and OpenClaw are worth a serious look.&lt;/p&gt;

&lt;p&gt;Both of those things can be true at the same time.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Principal Engineer | 16+ years in databases, cloud &amp;amp; observability | Oracle · PostgreSQL · Kafka · AWS · Splunk · AppD | Platform engineering &amp;amp; AI delivery&lt;/em&gt;&lt;/p&gt;

</description>
      <category>n8n</category>
      <category>automation</category>
      <category>database</category>
      <category>devops</category>
    </item>
    <item>
      <title>How I Built a Zero-Subscription Local AI Stack — Inspired by a 60-Second YouTube Short</title>
      <dc:creator>Pranay ravi</dc:creator>
      <pubDate>Sun, 17 May 2026 02:33:21 +0000</pubDate>
      <link>https://forem.com/pranay_ravi_b88172eac205c/how-i-built-a-completely-free-local-ai-stack-inspired-by-a-60-second-youtube-short-3e39</link>
      <guid>https://forem.com/pranay_ravi_b88172eac205c/how-i-built-a-completely-free-local-ai-stack-inspired-by-a-60-second-youtube-short-3e39</guid>
      <description>&lt;h1&gt;
  
  
  How I Built a Completely Free Local AI Stack — Inspired by a 60-Second YouTube Short
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;By Pranaychandra Ravi&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;It started with a YouTube Short. Someone on my feed casually demonstrated connecting a local AI model to Claude Code and I stopped mid-scroll. No API key. No subscription. No code leaving their machine. I had to know how it worked.&lt;/p&gt;

&lt;p&gt;What followed was a deep dive into local AI — Ollama, Gemma4, Docker, Open WebUI, vector databases, context windows, and a Python script that made my local model generate an ASCII diagram of the Earth and Moon. This post documents everything I learned, every question I asked, and every mistake I made along the way. If you're curious about running AI entirely on your own hardware, this one is for you.&lt;/p&gt;




&lt;h2&gt;
  
  
  First Question: Wait, Is This Actually Free?
&lt;/h2&gt;

&lt;p&gt;My first instinct was skepticism. Claude Code is Anthropic's product. Surely using it requires a Claude subscription?&lt;/p&gt;

&lt;p&gt;The short answer is &lt;strong&gt;no — not when you pair it with Ollama and a local model.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's what I learned: Claude Code is the &lt;em&gt;agent&lt;/em&gt; — the tool that reads your files, runs commands, edits code, and manages multi-step tasks in your terminal. By default it calls Anthropic's API, which costs money. But Claude Code exposes environment variables that let you redirect those API calls anywhere you want — including a local Ollama server running on your own machine.&lt;/p&gt;

&lt;p&gt;Ollama added official support for Anthropic's Messages API format, meaning Claude Code can talk to it natively. No hacks, no middleware, no subscription. The only cost is your own electricity and hardware.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code  →  talks to  →  Ollama (local server)  →  runs  →  Your model
                              (no Anthropic servers involved)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  So What Exactly Is Ollama?
&lt;/h2&gt;

&lt;p&gt;Before I could set anything up I needed to understand what Ollama actually is, because "install Ollama" doesn't tell you much.&lt;/p&gt;

&lt;p&gt;Think of Ollama as two things in one:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. A model manager&lt;/strong&gt; — it downloads, stores, and organizes AI models on your machine. Like a package manager but for AI brains.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. A local API server&lt;/strong&gt; — once running, it exposes an endpoint at &lt;code&gt;http://localhost:11434&lt;/code&gt; that any application can call. Your code, Claude Code, Open WebUI, VS Code extensions — anything that speaks the Anthropic or OpenAI API format can connect to it.&lt;/p&gt;

&lt;p&gt;This is the key insight I kept coming back to: &lt;strong&gt;Ollama itself has no intelligence&lt;/strong&gt;. It's an empty engine. You have to download a model — a large file containing all the AI's weights and knowledge — before anything useful happens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Without a model:   Ollama = empty server, useless
With a model:      Ollama = fully local AI, free forever
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Downloading Your First Model — Which One?
&lt;/h2&gt;

&lt;p&gt;This is where hardware matters. I have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;32GB RAM&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;NVIDIA GPU with ~11GB VRAM&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Core i9 processor&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With an NVIDIA card, Ollama automatically uses CUDA — no setup needed. Your GPU handles inference and it's dramatically faster than CPU-only.&lt;/p&gt;

&lt;p&gt;The key concept here is &lt;strong&gt;VRAM vs RAM&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Model fits in VRAM  →  GPU handles everything  →  Very fast ✅
Model too big for VRAM  →  spills into system RAM  →  Slower ⚠️
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With 11GB VRAM I can fit most 7B–13B parameter models entirely in GPU memory, which means fast, snappy responses.&lt;/p&gt;

&lt;p&gt;After thinking through my use cases — coding help, image analysis, document review — I landed on &lt;strong&gt;Gemma4&lt;/strong&gt; (Google's multimodal model, ~12GB). Here's why it beat out alternatives like Qwen3.6 (28GB):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Gemma4&lt;/th&gt;
&lt;th&gt;Qwen3.6&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Size&lt;/td&gt;
&lt;td&gt;~12GB&lt;/td&gt;
&lt;td&gt;~28GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fits in 11GB VRAM&lt;/td&gt;
&lt;td&gt;Nearly (tiny RAM overflow)&lt;/td&gt;
&lt;td&gt;Partial (big RAM spill)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image understanding&lt;/td&gt;
&lt;td&gt;✅ Yes (multimodal)&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coding quality&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Better&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speed on my hardware&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Slower&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;My use cases included image-to-text extraction and converting images to coloring pages — Qwen3.6 can't do either because it's text-only. Gemma4 won.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull gemma4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One command. It downloads, verifies, and stores the model. You can see progress in the terminal.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture in Plain English
&lt;/h2&gt;

&lt;p&gt;Before going further, I want to share the mental model that made everything click for me:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│                    YOUR COMPUTER                    │
│                                                     │
│  ┌─────────────┐    ┌──────────────┐               │
│  │ Claude Code │───▶│    Ollama    │               │
│  │  (terminal) │    │ :11434 (API) │               │
│  └─────────────┘    └──────┬───────┘               │
│                            │                        │
│  ┌─────────────┐    ┌──────▼───────┐               │
│  │  Open WebUI │───▶│   Gemma4    │               │
│  │  (browser)  │    │  (the brain) │               │
│  └─────────────┘    └─────────────┘               │
│                                                     │
│  ┌─────────────┐                                   │
│  │  Python API │───▶ http://localhost:11434        │
│  │   scripts   │                                   │
│  └─────────────┘                                   │
└─────────────────────────────────────────────────────┘
              Zero data leaves your machine
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three different interfaces. One local model. Everything private.&lt;/p&gt;




&lt;h2&gt;
  
  
  Context Windows — What Are They and Why Do They Matter?
&lt;/h2&gt;

&lt;p&gt;One of the most important concepts I clarified was the &lt;strong&gt;context window&lt;/strong&gt; — the model's working memory. It's the maximum amount of text a model can "see" at once in a conversation. Exceed it and it starts forgetting the beginning.&lt;/p&gt;

&lt;p&gt;Here's the reality check comparison:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Claude Sonnet 4.5&lt;/th&gt;
&lt;th&gt;Gemma4 (local)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;200,000 tokens&lt;/td&gt;
&lt;td&gt;~8,000–32,000 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Approximate pages&lt;/td&gt;
&lt;td&gt;~150,000 words&lt;/td&gt;
&lt;td&gt;~6,000–24,000 words&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6 years of tax docs&lt;/td&gt;
&lt;td&gt;Handles comfortably&lt;/td&gt;
&lt;td&gt;Would overflow&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Your VRAM directly affects how large a context window your local model can hold. More VRAM = more of the model loaded = bigger context available.&lt;/p&gt;

&lt;p&gt;You can manually increase it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run gemma4 &lt;span class="nt"&gt;--ctx-size&lt;/span&gt; 32768
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For single documents, images, or focused coding tasks — perfectly fine. For analyzing six years of tax filings all at once? That's where Claude's 200k context is a genuine advantage local models can't match yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Can Local Models Search the Internet?
&lt;/h2&gt;

&lt;p&gt;Short answer: &lt;strong&gt;No, not by default.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Local models are frozen at their training date. They have no internet connection during your conversation. This was an important distinction to understand.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude (this chat)  →  Has web search tool  →  Knows current events ✅
Gemma4 (local)     →  No internet          →  Knowledge frozen at training ❌
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This raised an interesting follow-up question though. When I used Gemini to analyze my tax filing and it spotted mistakes — was it searching the internet to find them?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No.&lt;/strong&gt; And this was a real misconception I had.&lt;/p&gt;

&lt;p&gt;Gemini found tax errors because tax law, IRS rules, and common filing mistakes were baked into the model during training. It learned from millions of tax documents, accounting textbooks, and IRS publications. During your session it's not googling anything — it's applying trained knowledge to your specific document.&lt;/p&gt;

&lt;p&gt;Think of it like a tax accountant. They studied tax law for years. When reviewing your return they're not searching Google — they're applying what they already know to what you show them.&lt;/p&gt;

&lt;p&gt;Local models work the same way. The difference is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemini/Claude&lt;/strong&gt;: More recent training data, larger knowledge base, up-to-date tax law changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemma4 local&lt;/strong&gt;: Good foundational knowledge, may be slightly behind on very recent rule changes, but &lt;strong&gt;your documents never leave your machine&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For sensitive financial documents, that privacy trade-off is significant.&lt;/p&gt;




&lt;h2&gt;
  
  
  Connecting Claude Code to Gemma4
&lt;/h2&gt;

&lt;p&gt;This was surprisingly simple. Claude Code reads three environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:11434
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or using Ollama's built-in launcher:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama launch claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Claude Code started up I saw this at the bottom of the welcome screen:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gemma4 · API Usage Billing · pranayraavi@gmail.com's Organization
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That confirms it's using Gemma4 through Ollama. No Anthropic billing. No subscription.&lt;/p&gt;

&lt;p&gt;What you get with this setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ File reading and editing across your project&lt;/li&gt;
&lt;li&gt;✅ Terminal command execution&lt;/li&gt;
&lt;li&gt;✅ Multi-step agentic coding tasks&lt;/li&gt;
&lt;li&gt;✅ Git operations&lt;/li&gt;
&lt;li&gt;✅ MCP connectors and plugins&lt;/li&gt;
&lt;li&gt;✅ Project context awareness&lt;/li&gt;
&lt;li&gt;⚠️ Intelligence capped at Gemma4's capability (weaker than Claude Sonnet/Opus)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Python API Test
&lt;/h2&gt;

&lt;p&gt;Before setting up a GUI I wanted to confirm the raw API worked. Here's the script I wrote:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a hello world in ascii diagram of moon and earth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;          (           )
         /              \
  ----(---O---)    (------)  &amp;lt;-- Orbit Path
 /  /   \    /  /   \
|   |     | | |     |   |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemma4, running entirely on my machine, responding to a Python script. No API key. No internet. Completely local. This was the moment it really clicked.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting Up Open WebUI — The ChatGPT-Like Interface
&lt;/h2&gt;

&lt;p&gt;For a proper GUI I went with &lt;strong&gt;Open WebUI&lt;/strong&gt; — a beautiful, feature-rich interface that runs locally and connects to Ollama.&lt;/p&gt;

&lt;p&gt;First attempt using pip failed because I had Python 3.13 and Open WebUI requires Python 3.11 or 3.12:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;ERROR: Could not find a version that satisfies the requirement open-webui
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So I went the Docker route instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing Docker Desktop
&lt;/h3&gt;

&lt;p&gt;Docker Desktop is free for personal use. Download from &lt;code&gt;docker.com/products/docker-desktop&lt;/code&gt;. During install, WSL 2 backend gets configured automatically on Windows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running Open WebUI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;docker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;127.0.0.1:3000:8080&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;--name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;open-webui&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-v&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;open-webui:/app/backend/data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;--add-host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;host.docker.internal:host-gateway&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;ghcr.io/open-webui/open-webui:main&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I initially tried &lt;code&gt;-p 3000:80&lt;/code&gt; which caused a port conflict (another process was using port 3000 on my machine). Switching to &lt;code&gt;-p 127.0.0.1:3000:8080&lt;/code&gt; fixed it.&lt;/p&gt;

&lt;p&gt;Confirmed it was running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;netstat&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ano&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;findstr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;3000&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c"&gt;# TCP  0.0.0.0:3000  LISTENING  ← Docker up and running&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="n"&gt;curl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http://localhost:3000&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c"&gt;# StatusCode: 200 OK  ← Server responding&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then opened &lt;code&gt;http://localhost:3000&lt;/code&gt; in Chrome and saw the Open WebUI interface with Gemma4 auto-detected.&lt;/p&gt;

&lt;h3&gt;
  
  
  First Real Test — Image to Text Extraction
&lt;/h3&gt;

&lt;p&gt;One of the reasons I picked Gemma4 over Qwen3.6 was its multimodal capability — it can actually &lt;em&gt;see&lt;/em&gt; images. I put this to the test immediately.&lt;/p&gt;

&lt;p&gt;I had a photo of handwritten chess notes and uploaded it directly into the Open WebUI chat. The prompt was simple: &lt;em&gt;"convert this image to text"&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Gemma4 thought for 11 seconds and returned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FORK/DOUBLE ATTACK

When we attack two or more pieces at the same time then it is known
as fork or double attack

Note- Knights are good at making fork.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's a perfect transcription of handwritten text — extracted entirely locally, no cloud OCR service, no API key, nothing leaving my machine. It even generated a relevant follow-up suggestion: &lt;em&gt;"Are there other kinds of tactical attacks besides forks, like pins or skewers?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is the multimodal capability in action:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Handwritten text extracted accurately&lt;/li&gt;
&lt;li&gt;✅ Context understood (chess notes)&lt;/li&gt;
&lt;li&gt;✅ Intelligent follow-up suggested&lt;/li&gt;
&lt;li&gt;✅ 100% local — image never left my PC&lt;/li&gt;
&lt;li&gt;✅ Free&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For anyone with scanned documents, handwritten notes, receipts, or any image containing text — this works out of the box with Gemma4 in Open WebUI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Document Upload and RAG — How It Actually Works
&lt;/h2&gt;

&lt;p&gt;One of the most powerful features of Open WebUI is document upload with &lt;strong&gt;RAG (Retrieval Augmented Generation)&lt;/strong&gt;. This is how you can upload your AWS docs, tax returns, or any PDFs and chat with them.&lt;/p&gt;

&lt;p&gt;Here's what happens under the hood:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You upload PDF
      ↓
Open WebUI splits it into chunks
      ↓
Converts chunks to embeddings (mathematical vectors)
      ↓
Stores in ChromaDB (local vector database)
      ↓
You ask a question
      ↓
ChromaDB finds the most relevant chunks
      ↓
Sends chunks to Gemma4 as context
      ↓
Gemma4 answers based on YOUR document
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything is stored locally at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;C:\Users\lavan\AppData\Roaming\open-webui\data\
  📁 vector_db    ← document embeddings (ChromaDB)
  📁 uploads      ← original files
  📄 webui.db     ← chat history (SQLite)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your documents never leave your machine. ChromaDB is completely free and open source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One important limitation&lt;/strong&gt;: RAG finds &lt;em&gt;relevant chunks&lt;/em&gt;, not the entire document. If an answer spans many sections of a large document, it might miss some context. The workaround is to upload smaller, focused documents rather than one giant PDF.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Stack — What I Now Have Running
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✅ Ollama          — model manager and local API server
✅ Gemma4          — the AI model (multimodal, ~12GB)
✅ Claude Code     — agentic coding with local model
✅ Open WebUI      — browser-based chat interface with document upload
✅ Python API      — scripts calling the model directly
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total monthly cost: &lt;strong&gt;$0&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use What
&lt;/h2&gt;

&lt;p&gt;After going through all of this, here's the practical split I settled on:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Coding with file editing&lt;/td&gt;
&lt;td&gt;Claude Code + Gemma4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image analysis / image to text&lt;/td&gt;
&lt;td&gt;Open WebUI + Gemma4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document Q&amp;amp;A (private)&lt;/td&gt;
&lt;td&gt;Open WebUI + RAG + Gemma4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web research / current events&lt;/td&gt;
&lt;td&gt;Claude.ai or Perplexity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex reasoning / large context&lt;/td&gt;
&lt;td&gt;Claude.ai (paid)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tax doc analysis (all years)&lt;/td&gt;
&lt;td&gt;Claude.ai or NotebookLM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quick Python scripts calling AI&lt;/td&gt;
&lt;td&gt;Direct Ollama API&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Honest Reflections
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What surprised me&lt;/strong&gt;: How straightforward the setup actually was once I understood the mental model. Ollama is the server, the model is the brain, everything else just connects to it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I underestimated&lt;/strong&gt;: The quality gap between local models and Claude Sonnet/Opus is real. For simple tasks Gemma4 is impressive. For complex multi-step reasoning, Claude's frontier models are noticeably stronger.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd tell myself at the start&lt;/strong&gt;: Local AI is not a replacement for cloud AI — it's a complement. Use local for private, repetitive, or experimental tasks. Use cloud AI for research, complex reasoning, and anything that benefits from a larger context window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The privacy win is real&lt;/strong&gt;: For sensitive documents — financial records, personal data, proprietary code — local AI is genuinely better from a privacy standpoint. Your data does not leave your machine. Full stop.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Ollama: &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;ollama.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Open WebUI: &lt;a href="https://openwebui.com" rel="noopener noreferrer"&gt;openwebui.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Claude Code: &lt;a href="https://claude.ai/code" rel="noopener noreferrer"&gt;claude.ai/code&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Ollama + Claude Code docs: &lt;a href="https://docs.ollama.com/integrations/claude-code" rel="noopener noreferrer"&gt;docs.ollama.com/integrations/claude-code&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docker Desktop (free): &lt;a href="https://docker.com/products/docker-desktop" rel="noopener noreferrer"&gt;docker.com/products/docker-desktop&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;All of this runs on a Windows machine with 32GB RAM, an NVIDIA GPU with ~11GB VRAM, and a Core i9 processor. If you have similar hardware you can replicate this entire stack in an afternoon.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>showdev</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
