<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Lingdas1</title>
    <description>The latest articles on Forem by Lingdas1 (@lingdas1).</description>
    <link>https://forem.com/lingdas1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3946584%2F1c01a3c9-e259-4e60-8976-35c6925624ba.png</url>
      <title>Forem: Lingdas1</title>
      <link>https://forem.com/lingdas1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/lingdas1"/>
    <language>en</language>
    <item>
      <title>👻 Crash #1: The Gateway Ghost — When Your AI Pretends to Work</title>
      <dc:creator>Lingdas1</dc:creator>
      <pubDate>Sun, 24 May 2026 11:51:59 +0000</pubDate>
      <link>https://forem.com/lingdas1/crash-1-the-gateway-ghost-when-your-ai-pretends-to-work-4kao</link>
      <guid>https://forem.com/lingdas1/crash-1-the-gateway-ghost-when-your-ai-pretends-to-work-4kao</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; 1|# 👻 Crash #1: The Gateway Ghost
 2|
 3|&amp;gt; *"Did I do something wrong? Let me reinstall everything."*
 4|
 5|---
 6|
 7|## What Happened
 8|
 9|I followed the tutorial step by step. Everything installed perfectly. I was thrilled.
10|
11|Then the gateway — the bridge connecting my AI assistant to the messaging app — started disconnecting randomly. Sometimes it worked for hours. Sometimes it died after 10 minutes. No pattern, no error message, nothing to Google.
12|
13|**My response:** I wiped everything and reinstalled. Twice.
14|
15|**The actual fix:** I just needed to restart the gateway. That's it.
16|
17|---
18|
19|## What I Learned
20|
21|**Before you assume you broke something, try turning it off and on again.**
22|
23|It's a cliché because it works. I wasted an entire evening reinstalling software that was fine. The problem was a process that needed a kick, not a configuration that needed a rewrite.
24|
25|---
26|
27|## 🛡️ Golden Rule Reminder
28|
29|&amp;gt; **If it works, don't touch it.** I reinstalled a perfectly good setup twice before trying the simplest fix. Always try the 10-second solution before the 2-hour one.
30|
31|&amp;gt; **Run everything in a VM.** If my gateway was already inside a VM with a snapshot, I could have just rolled back instead of reinstalling from scratch.
32|
33|---
34|
35|*← Full story: [I Broke My AI Assistant 7 Times](https://dev.to/lingdas1/i-broke-my-ai-assistant-7-times-heres-what-i-learned-47le)*
36|
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h2&gt;
  
  
  💬 Your Turn
&lt;/h2&gt;

&lt;p&gt;Have you run into a similar problem? Or hit a wall I didn't mention?&lt;/p&gt;

&lt;p&gt;Drop a comment below — I read every single one. Your experience might help someone else who's stuck on the same thing.&lt;/p&gt;

&lt;p&gt;The more we share our screw-ups, the fewer people have to make them. 🤝&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>story</category>
      <category>productivity</category>
    </item>
    <item>
      <title>🏗️ Day 1: I Almost Bought a Phone for AI (And Other Beginner Mistakes)</title>
      <dc:creator>Lingdas1</dc:creator>
      <pubDate>Sun, 24 May 2026 11:03:15 +0000</pubDate>
      <link>https://forem.com/lingdas1/day-1-i-almost-bought-a-phone-for-ai-and-other-beginner-mistakes-ada</link>
      <guid>https://forem.com/lingdas1/day-1-i-almost-bought-a-phone-for-ai-and-other-beginner-mistakes-ada</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; 1|# 🏗️ Day 1: I Almost Bought a Phone for AI (And Other Beginner Mistakes)&lt;br&gt;
 2|&lt;br&gt;
 3|&amp;gt; &lt;em&gt;The story of how I went from "I want a Jarvis" to actually building one — one crash at a time.&lt;/em&gt;&lt;br&gt;
 4|&lt;br&gt;
 5|---&lt;br&gt;
 6|&lt;br&gt;
 7|## How It Started&lt;br&gt;
 8|&lt;br&gt;
 9|I found out about AI the same way most people do: &lt;strong&gt;scrolling through videos.&lt;/strong&gt;&lt;br&gt;
10|&lt;br&gt;
11|One day, it was the "Doubao Phone" — a smartphone with a built-in AI assistant that could order food, compare prices, and even play games for you. &lt;em&gt;"Finally, my own Jarvis!"&lt;/em&gt; I thought. I almost bought one.&lt;br&gt;
12|&lt;br&gt;
13|Then the app stores blocked it. The hype died. On to the next thing.&lt;br&gt;
14|&lt;br&gt;
15|Next up: &lt;strong&gt;farming crayfish with AI.&lt;/strong&gt; Yes, that was a real trend. A virtual crayfish farm managed by an AI agent. Fun to watch, but the token costs were insane, and the AI kept forgetting what happened five minutes ago.&lt;br&gt;
16|&lt;br&gt;
17|I kept watching, kept wanting, kept feeling like AI was something &lt;em&gt;other people&lt;/em&gt; did.&lt;br&gt;
18|&lt;br&gt;
19|Then I found &lt;strong&gt;Hermes Agent&lt;/strong&gt; — an open-source AI assistant you can run on your own machine. Free. Private. No subscription.&lt;br&gt;
20|&lt;br&gt;
21|I searched for tutorials. Downloaded the files. And started the most frustrating, educational tech journey of my life.&lt;br&gt;
22|&lt;br&gt;
23|---&lt;br&gt;
24|&lt;br&gt;
25|## The Big Lesson&lt;br&gt;
26|&lt;br&gt;
27|Looking back, the problem wasn't that I didn't know enough. It was that I kept &lt;strong&gt;chasing the next shiny thing&lt;/strong&gt; instead of picking one path and sticking with it.&lt;br&gt;
28|&lt;br&gt;
29|&lt;strong&gt;The real lesson:&lt;/strong&gt; Stop waiting for the perfect AI product. The tools are already free and open source. You just need to pick one and start — even if you break it a few times along the way.&lt;br&gt;
30|&lt;br&gt;
31|---&lt;br&gt;
32|&lt;br&gt;
33|## 🛡️ The Golden Rule (Read This Before the Next Article)&lt;br&gt;
34|&lt;br&gt;
35|&amp;gt; &lt;strong&gt;If it works, don't touch it.&lt;/strong&gt;&lt;br&gt;
36|&amp;gt;&lt;br&gt;
37|&amp;gt; You never know which piece of your setup is holding everything together. That random config file you're not sure about? Leave it alone. Every time I thought "I'll just fix this one small thing," I spent 3 hours recovering.&lt;br&gt;
38|&amp;gt;&lt;br&gt;
39|&amp;gt; Even a stable system can break for no reason. When it does, fix only &lt;em&gt;that one thing&lt;/em&gt; — don't "improve" everything else while you're at it.&lt;br&gt;
40|&lt;br&gt;
41|&lt;strong&gt;My #1 recommendation for beginners:&lt;/strong&gt; Run everything inside a &lt;strong&gt;virtual machine (VM)&lt;/strong&gt; with Linux. Give it 100-200GB of disk space (not C: drive!). This isolates 90% of problems — host OS breaks? VM still works. VM breaks? Just restore a snapshot.&lt;br&gt;
42|&lt;br&gt;
43|---&lt;br&gt;
44|&lt;br&gt;
45|&lt;em&gt;← Read the full story first: &lt;a href="https://dev.to/lingdas1/i-broke-my-ai-assistant-7-times-heres-what-i-learned-47le"&gt;I Broke My AI Assistant 7 Times&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
46|&lt;br&gt;
47|&lt;em&gt;Next: The Gateway Ghost 👻 →&lt;/em&gt;&lt;br&gt;
48|---&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  💬 Your Turn&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;Have you run into a similar problem? Or hit a wall I didn't mention?&lt;/p&gt;

&lt;p&gt;Drop a comment below — I read every single one. Your experience might help someone else who's stuck on the same thing.&lt;/p&gt;

&lt;p&gt;The more we share our screw-ups, the fewer people have to make them. 🤝&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>story</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I Broke My AI Assistant 7 Times. Here's What I Learned.</title>
      <dc:creator>Lingdas1</dc:creator>
      <pubDate>Sun, 24 May 2026 10:33:01 +0000</pubDate>
      <link>https://forem.com/lingdas1/i-broke-my-ai-assistant-7-times-heres-what-i-learned-47le</link>
      <guid>https://forem.com/lingdas1/i-broke-my-ai-assistant-7-times-heres-what-i-learned-47le</guid>
      <description>&lt;h1&gt;
  
  
  I Broke My AI Assistant 7 Times. Here's What I Learned.
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;One medical student's journey from "I want a Jarvis" to accidentally becoming a self-taught DevOps engineer.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Beginning: I Almost Bought a Phone for AI
&lt;/h2&gt;

&lt;p&gt;It started with a video.&lt;/p&gt;

&lt;p&gt;I was scrolling through Bilibili (think YouTube, but Chinese) and saw something that blew my mind: &lt;strong&gt;the "Doubao Phone."&lt;/strong&gt; A smartphone with a built-in AI assistant that could do everything — order food, compare prices across stores, play games for you, book appointments. &lt;em&gt;"Finally,"&lt;/em&gt; I thought, &lt;em&gt;"my own Jarvis."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I almost bought it.&lt;/p&gt;

&lt;p&gt;Then the app store drama happened. The big companies blocked Doubao's integrations. The phone stopped being magical. And I moved on to the next viral thing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Farming crayfish with AI.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, that was a real trend. You could deploy an AI agent that managed a virtual crayfish farm. It was hilarious but also... expensive. The token costs were insane, and the AI kept forgetting what happened five minutes ago.&lt;/p&gt;

&lt;p&gt;I watched from the sidelines, feeling that familiar itch: &lt;em&gt;"I want to do this too, but I don't know how."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Then I found &lt;strong&gt;Hermes Agent&lt;/strong&gt; — an open-source AI assistant you can run on your own computer. Free. Private. Controllable.&lt;/p&gt;

&lt;p&gt;I searched Bilibili for tutorials. Downloaded the files. And thus began the longest, most frustrating, most educational tech journey of my life.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup: 7 Times I Broke Everything
&lt;/h2&gt;

&lt;p&gt;Here's the honest story of what happened when a medical student with no coding background tried to deploy an AI assistant on his own.&lt;/p&gt;




&lt;h3&gt;
  
  
  💥 Crash #1: The Gateway Ghost
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; I followed the tutorial step by step. Everything installed fine. Then the gateway started disconnecting randomly. Sometimes it worked for hours. Sometimes it died after 10 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My reaction:&lt;/strong&gt; &lt;em&gt;"Did I do something wrong? Let me reinstall everything."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What actually fixed it:&lt;/strong&gt; Restarting the gateway. That's it. Just... restarting it. I had already wiped and reinstalled twice before I figured this out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt; Before assuming you broke something, try turning it off and on again. It's cliché because it works.&lt;/p&gt;




&lt;h3&gt;
  
  
  💥 Crash #2: Russia's Internet Hates Me
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; I'm studying in Russia, and the internet here is... let's say &lt;em&gt;unstable.&lt;/em&gt; The VPN blocks. The DNS dies. The whole building loses connection for hours at a time.&lt;/p&gt;

&lt;p&gt;I thought: &lt;em&gt;"No problem — I'll download some local AI models so my assistant can work offline."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I spent a weekend downloading models. Got everything set up. It was beautiful.&lt;/p&gt;

&lt;p&gt;The next morning, Windows gave me a blue screen of death. When it rebooted, &lt;strong&gt;all my downloaded models were gone.&lt;/strong&gt; Corrupted. Unreadable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My reaction:&lt;/strong&gt; Staring at my screen in disbelief. 20GB of models, gone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What actually fixed it:&lt;/strong&gt; I switched to a different model loader, redownloaded everything, and took a screenshot of the working config this time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt; Backup your configuration &lt;strong&gt;before&lt;/strong&gt; you think you need it. Not after.&lt;/p&gt;




&lt;h3&gt;
  
  
  💥 Crash #3: The C: Drive Betrayal
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; Everything installed to C: drive by default. Models, tools, environments — all happily eating up space on my system drive.&lt;/p&gt;

&lt;p&gt;One morning, Windows greeted me with: &lt;strong&gt;"Your C: drive is almost full."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Panic.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I decided to move everything to D: drive. I consulted with another AI, got detailed migration instructions, and followed them carefully.&lt;/p&gt;

&lt;p&gt;Everything broke.&lt;/p&gt;

&lt;p&gt;My assistant couldn't find its files. WSL refused to start. Models were looking for paths that no longer existed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My reaction:&lt;/strong&gt; &lt;em&gt;"But... I followed the instructions!"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What actually fixed it:&lt;/strong&gt; I restored from a backup I thankfully made before starting, and did the migration &lt;strong&gt;one piece at a time&lt;/strong&gt; — move WSL first, confirm it works, then move the model loader, confirm it works, then move the assistant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt; Never migrate everything at once. One step at a time. And always have a rollback plan.&lt;/p&gt;




&lt;h3&gt;
  
  
  💥 Crash #4: The Emulator War
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; Remember that Android emulator I installed months ago to play mobile games? I had uninstalled it. No big deal, right?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After uninstalling the emulator, WSL2 started throwing this error: &lt;code&gt;HCS_E_SERVICE_NOT_AVAILABLE&lt;/code&gt;. Virtualization broke. Windows Subsystem for Linux stopped working. My AI couldn't run.&lt;/p&gt;

&lt;p&gt;It turned out the emulator and WSL2 were fighting over the same virtualization resources. And when I removed the emulator, it took something with it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My reaction:&lt;/strong&gt; &lt;em&gt;"I just deleted a game emulator. How does that break my AI assistant?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What actually fixed it:&lt;/strong&gt; Multiple restarts, repairing Windows Hyper-V components, and a lot of swearing at my screen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt; Your computer's virtualization layer is like a house of cards. Remove one component and the whole thing can collapse. Also: Windows 11 Home edition hides virtualization settings, making this 10x harder to debug.&lt;/p&gt;




&lt;h3&gt;
  
  
  💥 Crash #5: The Great OS Migration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; After the emulator war, I decided enough was enough. I backed up everything, wiped my computer, and installed a fresh Windows. This time, I would run my AI inside a &lt;strong&gt;virtual machine&lt;/strong&gt; with Linux. No more WSL2 headaches.&lt;/p&gt;

&lt;p&gt;It worked. For about a day.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My reaction:&lt;/strong&gt; Relief followed by confusion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What actually fixed it:&lt;/strong&gt; Nothing — it worked fine. I just didn't trust it anymore.&lt;/p&gt;




&lt;h3&gt;
  
  
  💥 Crash #6: The Invisible Network Cable
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; My host computer (Windows) had internet. My VM (Linux) didn't. The network adapter was set to NAT, just like every tutorial said. But the VM couldn't reach the outside world.&lt;/p&gt;

&lt;p&gt;I spent hours checking settings, reinstalling network drivers, changing adapter types.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My reaction:&lt;/strong&gt; &lt;em&gt;"The internet works on my laptop. Why doesn't it work INSIDE my laptop?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What actually fixed it:&lt;/strong&gt; The VMware NAT Service and DHCP Service weren't running in Windows. They're supposed to start automatically. They didn't. One click to start them, and everything worked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt; When virtualization networking breaks, check the &lt;strong&gt;host services first&lt;/strong&gt;, not the VM settings. And &lt;code&gt;ping&lt;/code&gt; and &lt;code&gt;curl&lt;/code&gt; are better debugging tools than staring at network icons.&lt;/p&gt;




&lt;h3&gt;
  
  
  💥 Crash #7: The Gateway That Lied to Me
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; I had set up the gateway to auto-start on boot. I checked the configuration. It said &lt;code&gt;enabled: true&lt;/code&gt;. I was confident.&lt;/p&gt;

&lt;p&gt;The next morning, my AI was offline again.&lt;/p&gt;

&lt;p&gt;The gateway had "started" but hadn't actually connected. It was running as a process, but doing nothing useful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My reaction:&lt;/strong&gt; &lt;em&gt;"But I set it to auto-start! Why is it lying to me?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What actually fixed it:&lt;/strong&gt; I wrote a simple script that checks every 5 minutes whether the gateway is actually connected, and restarts it if not. Bulletproof.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt; "Running" and "working" are two different things. Always add a health check.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Golden Rule: Don't Touch It
&lt;/h2&gt;

&lt;p&gt;After weeks of crashes, debugging, and existential crises, my setup finally stabilized. Everything worked. The gateway stayed connected. The models loaded correctly. Messages flowed.&lt;/p&gt;

&lt;p&gt;And I learned the most important lesson of all:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If it works, don't touch it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You never know which piece of your spaghetti-code setup is holding everything together. That random config file? The one you're not sure does anything? Yeah, it probably does something. Leave it alone.&lt;/p&gt;

&lt;p&gt;Every time I thought &lt;em&gt;"I'll just fix this one small thing,"&lt;/em&gt; I ended up spending 3 hours recovering from the consequences.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Want You to Know
&lt;/h2&gt;

&lt;p&gt;I'm telling you all this not because I'm an expert — I'm not. I'm a medical student. I study anatomy, not APIs. I chose this career because I wanted to help people, not because I wanted to debug network services at 2 AM.&lt;/p&gt;

&lt;p&gt;But I got it working. And if I can, you can too.&lt;/p&gt;

&lt;p&gt;Here's what I learned that actually matters:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Before I started&lt;/th&gt;
&lt;th&gt;After I broke everything 7 times&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"AI is for programmers"&lt;/td&gt;
&lt;td&gt;"AI is for anyone stubborn enough to try"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"I'll just follow the tutorial"&lt;/td&gt;
&lt;td&gt;"I'll follow the tutorial &lt;em&gt;and&lt;/em&gt; backup first"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"It should work perfectly"&lt;/td&gt;
&lt;td&gt;"It will break, and that's normal"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"I'm not technical enough"&lt;/td&gt;
&lt;td&gt;"Being patient matters more than being technical"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Your Turn
&lt;/h2&gt;

&lt;p&gt;If you're reading this and thinking &lt;em&gt;"That sounds like me"&lt;/em&gt; — good. You're exactly who I wrote this for.&lt;/p&gt;

&lt;p&gt;Start with something small. Expect it to break. Backup before you change anything. And when it finally works, &lt;strong&gt;leave it alone.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'm still learning. Every day something new confuses me. But I'm not scared of it anymore — because I've already broken everything that could break.&lt;/p&gt;

&lt;p&gt;And the AI is still running.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Hi, I'm Ling. I'm a medical student in China who somehow became a self-taught AI deployer. No CS degree, no big tech job — just a laptop, broken internet, and way too much stubbornness.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is the first of my "Real People, Real AI" series. ⭐ &lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;Star the GitHub repo&lt;/a&gt; to get notified when the next one drops.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;P.S. — If you've broken your own AI setup in a creative way, leave a comment. Misery loves company. 😄&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>productivity</category>
      <category>story</category>
    </item>
    <item>
      <title>What Is an LLM? (No, It's Not Magic — Here's What's Actually Happening)</title>
      <dc:creator>Lingdas1</dc:creator>
      <pubDate>Sun, 24 May 2026 09:36:12 +0000</pubDate>
      <link>https://forem.com/lingdas1/what-is-an-llm-no-its-not-magic-heres-whats-actually-happening-3ond</link>
      <guid>https://forem.com/lingdas1/what-is-an-llm-no-its-not-magic-heres-whats-actually-happening-3ond</guid>
      <description>&lt;h1&gt;
  
  
  What Is an LLM? (No, It's Not Magic — Here's What's Actually Happening)
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The plain-English guide to understanding AI — no jargon, no code, just the stuff that matters.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;My grandfather called it "the thinking computer."&lt;/p&gt;

&lt;p&gt;I showed him ChatGPT, and he asked: &lt;em&gt;"Does it... think? Like a person?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It's a good question. And honestly, most explanations of AI are terrible at answering it. Either they're too technical (&lt;em&gt;"a transformer-based neural network with self-attention mechanisms"&lt;/em&gt; — whatever that means) or too mystical (&lt;em&gt;"it's like a digital brain!"&lt;/em&gt; — no, it's not).&lt;/p&gt;

&lt;p&gt;So let me explain what an LLM actually is. No jargon. No magic. Just the truth.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Analogy: A Chef Who's Tried Every Recipe
&lt;/h2&gt;

&lt;p&gt;Imagine the world's most experienced chef. This chef has read &lt;strong&gt;every&lt;/strong&gt; cookbook ever written. Every recipe from every culture. Every food blog. Every handwritten note from every grandmother.&lt;/p&gt;

&lt;p&gt;You ask this chef: &lt;em&gt;"Can you make me something with chicken, lemon, and garlic?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The chef has never made &lt;em&gt;that exact dish&lt;/em&gt; before, but they've read millions of recipes. They know what works. They know chicken + lemon + garlic usually means a Mediterranean-style dish. They know garlic should be minced, not whole. They know lemon juice goes in near the end, not the beginning.&lt;/p&gt;

&lt;p&gt;So they create a new recipe, perfectly reasonable, that has never existed before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's what an LLM does.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It's not "thinking." It's not "conscious." It has read an unimaginable amount of human text — books, articles, conversations, code — and learned the patterns of how we write and reason.&lt;/p&gt;

&lt;p&gt;When you ask it a question, it doesn't "look up" an answer. It &lt;em&gt;generates&lt;/em&gt; one, word by word, based on everything it has learned.&lt;/p&gt;




&lt;h2&gt;
  
  
  What LLM Actually Stands For
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;L&lt;/strong&gt;arge &lt;strong&gt;L&lt;/strong&gt;anguage &lt;strong&gt;M&lt;/strong&gt;odel.&lt;/p&gt;

&lt;p&gt;Let's break that down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Language&lt;/strong&gt; — It works with words. Text in, text out. That's its native language (pun intended).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model&lt;/strong&gt; — A mathematical representation of patterns. Think of it as a super-complex set of probabilities: &lt;em&gt;"After the word 'I', the next word is usually a verb, and after 'I want to', the next word is often 'go' or 'get' or 'make'..."&lt;/em&gt; × a billion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large&lt;/strong&gt; — Really, &lt;em&gt;really&lt;/em&gt; large. These models have been trained on most of the public internet. The biggest ones have learned patterns from trillions of words.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What It's NOT
&lt;/h2&gt;

&lt;p&gt;Let me clear up some common confusion:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Myth&lt;/th&gt;
&lt;th&gt;Truth&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🧠 "It thinks like a human"&lt;/td&gt;
&lt;td&gt;❌ No. It predicts words based on patterns. No consciousness, no feelings, no self-awareness.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;📚 "It knows everything"&lt;/td&gt;
&lt;td&gt;❌ It knows what it was trained on, which has a cutoff date. It doesn't "know" anything — it &lt;em&gt;generates&lt;/em&gt; plausible text.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🎯 "It's always right"&lt;/td&gt;
&lt;td&gt;❌ It can be confidently wrong. It's great at sounding correct even when it's making things up.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;📝 "It copies from the internet"&lt;/td&gt;
&lt;td&gt;❌ It doesn't store copies of web pages. It learned patterns and generates original text based on those patterns.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Why "Large" Matters
&lt;/h2&gt;

&lt;p&gt;Imagine two chefs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chef A&lt;/strong&gt; has read 10 recipes. They know how to make exactly 10 dishes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chef B&lt;/strong&gt; has read 10 million recipes. They understand cuisine at a deep level.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLMs work the same way. The "large" in "Large Language Model" refers to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The amount of training data&lt;/strong&gt; — billions of web pages, books, and documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The number of parameters&lt;/strong&gt; — think of these as "connections" in the model. A 7-billion-parameter model (small) has learned 7 billion patterns. A 70-billion-parameter model (large) has learned 70 billion.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;More parameters = more pattern recognition = better reasoning (usually).&lt;/p&gt;

&lt;p&gt;But here's the good news: &lt;strong&gt;you don't need the biggest model.&lt;/strong&gt; A 7-billion-parameter model, running on a laptop, can handle most everyday tasks just fine. It's like having Chef B-lite — still experienced, still useful, much more practical.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Actually Works (The Simplest Explanation)
&lt;/h2&gt;

&lt;p&gt;When you type a message, here's what happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You type: "What is the capital of France?"

Step 1: The model breaks your question into tokens (words and pieces of words).
         ["What", " is", " the", " capital", " of", " France", "?"]

Step 2: The model starts predicting the answer, one word at a time.
         "The" → "capital" → "of" → "France" → "is" → "Paris" → "."

Step 3: Each word is chosen based on probability.
         "The capital of France is..." → P(Paris) = 95%, P(Lyon) = 2%, P(Marseille) = 1%
         → It picks "Paris" (the most probable)

Step 4: Done! "The capital of France is Paris."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's not magic. It's a very, very sophisticated version of your phone's autocomplete — trained on the entire internet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters to You (a Regular Person)
&lt;/h2&gt;

&lt;p&gt;Here's why understanding this matters:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. You Don't Need to Be a Programmer
&lt;/h3&gt;

&lt;p&gt;If you understand that an LLM predicts words based on patterns, you already understand enough to use it. The tools are designed for everyone now.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. You Can Run It on Your Laptop
&lt;/h3&gt;

&lt;p&gt;Because LLMs are just math (very complicated math, but still math), they can run on any computer. A smaller model on your laptop is slower than ChatGPT — but it's private, free, and always available.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. You Should Be Skeptical
&lt;/h3&gt;

&lt;p&gt;Knowing that LLMs can be confidently wrong helps you use them better. Always fact-check important information. Use AI as a brainstorming partner, not an encyclopedia.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. You're Not Left Behind
&lt;/h3&gt;

&lt;p&gt;The people who benefit most from AI aren't programmers — they're writers, students, small business owners, artists, and curious people who ask good questions. That's probably you.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Different Types of AI (In Two Sentences)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Understands and generates text&lt;/td&gt;
&lt;td&gt;ChatGPT, Claude, DeepSeek&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Image generator&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Creates pictures from descriptions&lt;/td&gt;
&lt;td&gt;Midjourney, DALL-E, Stable Diffusion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Voice AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Understands and generates speech&lt;/td&gt;
&lt;td&gt;Siri, Whisper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recommendation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Predicts what you'll like&lt;/td&gt;
&lt;td&gt;TikTok, Netflix, YouTube&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This series focuses on LLMs — the text-based AI that can write, explain, analyze, and assist. It's the most useful type for everyday tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You Can Actually DO with This Knowledge
&lt;/h2&gt;

&lt;p&gt;Now that you know what an LLM is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You can use one right now, for free&lt;/strong&gt; — Ollama + a small model on your laptop&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You know the limits&lt;/strong&gt; — It's not magic, it's pattern recognition. Use it as a tool, not an oracle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You can explain it to others&lt;/strong&gt; — When your friends say "AI is taking over," you can say "Actually, it's just really good autocomplete, trained on a lot of data."&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Now that you know &lt;em&gt;what&lt;/em&gt; an LLM is, the next guide shows you how to actually run one:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Part 3:&lt;/strong&gt; &lt;em&gt;"Step-by-Step: Run Your First AI Model in 10 Minutes"&lt;/em&gt; — (coming next)&lt;/p&gt;

&lt;p&gt;No terminal commands you don't understand. No unexplained jargon. Just a simple walkthrough with screenshots.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Hi, I'm Ling. I'm a medical student who got tired of feeling left behind by AI. I started learning, broke things, fixed them, and now I'm sharing what I've learned — in plain English, for regular people.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Found this useful? ⭐ &lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;Star the GitHub repo&lt;/a&gt; to get notified when new guides drop. Or leave a comment — I'd love to hear what questions you still have.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>explainer</category>
      <category>technology</category>
    </item>
    <item>
      <title>AI Is Too Expensive? I Run It for Free on My Laptop</title>
      <dc:creator>Lingdas1</dc:creator>
      <pubDate>Sun, 24 May 2026 09:36:11 +0000</pubDate>
      <link>https://forem.com/lingdas1/ai-is-too-expensive-i-run-it-for-free-on-my-laptop-1cg</link>
      <guid>https://forem.com/lingdas1/ai-is-too-expensive-i-run-it-for-free-on-my-laptop-1cg</guid>
      <description>&lt;h1&gt;
  
  
  AI Is Too Expensive? I Run It for Free on My Laptop (Here's How)
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;A medical student's guide to using AI without paying a cent in subscription fees.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;I remember the exact moment I gave up on AI.&lt;/p&gt;

&lt;p&gt;It was January 2026. I was staring at ChatGPT Pro's $200/month price tag, then at my bank account. A medical student in China — my monthly budget for "extras" was about enough for two bubble teas.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"AI is for rich people,"&lt;/em&gt; I thought. &lt;em&gt;"Or people whose companies pay for it."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I closed the tab and went back to studying.&lt;/p&gt;

&lt;p&gt;But I couldn't shake the feeling that I was missing out. Everyone was talking about AI — coding assistants, research tools, writing helpers. And there I was, stuck with Google and a prayer.&lt;/p&gt;

&lt;p&gt;Three months later, I'm running GPT-4-class models on my five-year-old laptop. For free. No subscriptions, no API bills, no cloud credits.&lt;/p&gt;

&lt;p&gt;This is how I did it — and how you can too, even if you're not a programmer.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Lie We've Been Told
&lt;/h2&gt;

&lt;p&gt;Here's the thing nobody tells you about AI: &lt;strong&gt;you don't need the cloud.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every AI company wants you to believe you need their $20/month plan. Or their $200/month Pro plan. Or their enterprise plan (ask for pricing!).&lt;/p&gt;

&lt;p&gt;Why? Because they make money every time you type a message.&lt;/p&gt;

&lt;p&gt;But the technology itself? The actual AI model? &lt;strong&gt;It's open source.&lt;/strong&gt; Free. Public. Available for anyone to download and run.&lt;/p&gt;

&lt;p&gt;The only reason we don't is that nobody told us we could.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Thought vs What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Running AI locally? You need a $5,000 gaming PC with liquid cooling or something."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;My laptop has 8GB RAM and a mid-range GPU from 2021. I run AI models that answer questions, summarize articles, and help me study — all locally, all free.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"You need to be a programmer to set this up."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I'm a medical student. I know anatomy, not APIs. If I can do it, anyone can.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Local AI is worse than ChatGPT."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For most everyday tasks — writing, research, brainstorming — the difference is unnoticeable. And for some things (privacy, no censorship, unlimited use), local AI is actually &lt;em&gt;better&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What You Can Actually Do with Free AI
&lt;/h2&gt;

&lt;p&gt;Let me show you what I do daily, all on my laptop, all free:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Study Assistant
&lt;/h3&gt;

&lt;p&gt;I paste textbook chapters and ask questions. The model explains difficult concepts in simpler terms. No more watching expensive YouTube tutorials.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Writing Helper
&lt;/h3&gt;

&lt;p&gt;Essays, emails, notes — I draft them faster. The model suggests improvements but doesn't rewrite everything (I'm still learning English, so I need the practice).&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Research Buddy
&lt;/h3&gt;

&lt;p&gt;I download research papers as PDFs and ask questions about them. &lt;em&gt;"Summarize this in three bullet points."&lt;/em&gt; &lt;em&gt;"What's the main limitation of this study?"&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Brainstorming Partner
&lt;/h3&gt;

&lt;p&gt;When I'm stuck on an idea, I talk it out with the AI. It's like having a friend who never gets tired of your questions.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Language Practice
&lt;/h3&gt;

&lt;p&gt;I write something, ask the AI to correct my grammar, and learn from the feedback. It's like a free tutor who's available 24/7.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You Need (Real Talk)
&lt;/h2&gt;

&lt;p&gt;Let's be honest about what you need. No corporate marketing, just facts.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Minimum Setup
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Any computer&lt;/strong&gt; (Windows, Mac, Linux — even a $200 used laptop)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;At least 8GB of RAM&lt;/strong&gt; (16GB is better, but 8GB works)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internet connection&lt;/strong&gt; for the initial download (takes 10-15 minutes)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. No special GPU required. No expensive hardware.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Wait, I thought you needed a gaming graphics card?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You &lt;em&gt;can&lt;/em&gt; get better speed with a gaming GPU — but you don't &lt;em&gt;need&lt;/em&gt; one. Models that run on CPU are slower (think 5-10 seconds per response instead of 1-2 seconds), but they work perfectly fine for most tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  What It Looks Like
&lt;/h3&gt;

&lt;p&gt;The whole setup is basically this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Download a free program (Ollama) — 2 minutes
2. Pick a model (the "brain") — 1 click
3. Start chatting — immediately
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire process. I'll write a step-by-step guide with screenshots soon. For now, just know that it's &lt;strong&gt;much simpler&lt;/strong&gt; than you think.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Privacy Bonus Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Here's something I didn't expect: &lt;strong&gt;privacy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you use ChatGPT or Claude, everything you type goes to their servers. Your questions, your documents, your private thoughts.&lt;/p&gt;

&lt;p&gt;When you run AI locally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔒 Everything stays on your computer&lt;/li&gt;
&lt;li&gt;🔒 No one sees your conversations&lt;/li&gt;
&lt;li&gt;🔒 No data collection&lt;/li&gt;
&lt;li&gt;🔒 Works even without internet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a medical student handling sensitive patient data during rotations, this is huge. But even for everyday use — journal entries, personal projects, private brainstorming — it's nice to know your data is yours.&lt;/p&gt;




&lt;h2&gt;
  
  
  But Wait, Is It Actually Good?
&lt;/h2&gt;

&lt;p&gt;This is the question I get most. Let me give you an honest answer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For most everyday tasks? Yes, it's good enough.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writing emails → ✅ Great&lt;/li&gt;
&lt;li&gt;Summarizing articles → ✅ Great&lt;/li&gt;
&lt;li&gt;Brainstorming ideas → ✅ Great&lt;/li&gt;
&lt;li&gt;Explaining concepts → ✅ Great&lt;/li&gt;
&lt;li&gt;Writing code → ✅ Good (with the right model)&lt;/li&gt;
&lt;li&gt;Complex math → ✅ Good (with DeepSeek-R1)&lt;/li&gt;
&lt;li&gt;Creative writing → 🟡 Decent (varies by model)&lt;/li&gt;
&lt;li&gt;Real-time conversation → 🟡 A bit slower on CPU&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The only thing you really miss:&lt;/strong&gt; The absolute top-tier models (GPT-4o, Claude Opus) are still cloud-only. But 90% of what I need AI for, my local models handle just fine.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I'm Writing This
&lt;/h2&gt;

&lt;p&gt;I'm not a tech influencer. I don't sell courses or have affiliate links. I'm just a medical student who was frustrated by how expensive AI seemed — and then discovered it didn't have to be.&lt;/p&gt;

&lt;p&gt;Every guide I found was written by programmers, for programmers. They assumed I knew what a "terminal" was, what "GGUF" meant, how to "clone a repo."&lt;/p&gt;

&lt;p&gt;I didn't know any of that. I still barely do.&lt;/p&gt;

&lt;p&gt;But I learned enough to get it working. And if I can do it, you can too.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Coming Next
&lt;/h2&gt;

&lt;p&gt;I'm writing a series of plain-English guides for people who feel left behind by AI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 2:&lt;/strong&gt; &lt;em&gt;"What Is an LLM? (No, It's Not Magic)"&lt;/em&gt; — Explaining AI in simple terms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3:&lt;/strong&gt; &lt;em&gt;"Step-by-Step: Run Your First AI Model in 10 Minutes"&lt;/em&gt; — Screenshots included&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 4:&lt;/strong&gt; &lt;em&gt;"5 Free Things You Can Do with Local AI Right Now"&lt;/em&gt; — Practical use cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 5:&lt;/strong&gt; &lt;em&gt;"Local AI vs ChatGPT: An Honest Comparison"&lt;/em&gt; — No bias, just facts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Star the repo or follow me here to get notified when they drop.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;AI shouldn't be a luxury. The technology is free, the tools are simple, and the only thing standing between you and free AI is knowing it exists.&lt;/p&gt;

&lt;p&gt;I spent months thinking I couldn't afford AI. Turns out, I could afford it all along — I just didn't know where to look.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You can run AI on your laptop right now. For free. And it works.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If a medical student with zero coding background can figure it out, so can you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Hi, I'm Ling. I'm a medical student in China who fell into AI by accident. No CS degree, no big tech job — just a laptop, a lot of curiosity, and a belief that AI should be for everyone. This is the first of my "AI for the Rest of Us" series.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Found this useful? ⭐ &lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;Star the GitHub repo&lt;/a&gt; to get notified when new guides drop. Or leave a comment — I read every one.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Local LLM Guide: The Complete Series — Find Your Starting Point 👋</title>
      <dc:creator>Lingdas1</dc:creator>
      <pubDate>Sun, 24 May 2026 09:21:04 +0000</pubDate>
      <link>https://forem.com/lingdas1/local-llm-guide-the-complete-series-find-your-starting-point-4do5</link>
      <guid>https://forem.com/lingdas1/local-llm-guide-the-complete-series-find-your-starting-point-4do5</guid>
      <description>&lt;h1&gt;
  
  
  Welcome to Local LLM Guide 👋
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Hi, I'm Ling. I'm a medical student who fell into AI by accident. No CS degree, no big tech job — just a laptop, a lot of curiosity, and a belief that AI should be for everyone.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Not sure where to start? Pick your path:&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  👨‍💻 For Developers
&lt;/h2&gt;

&lt;p&gt;You want to run LLMs locally — on your own hardware, with your own data, without paying API fees.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Article&lt;/th&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Read Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;01&lt;/td&gt;
&lt;td&gt;&lt;a href="//./getting-started-run-your-first-local-llm-in-5-minutes-2i1j"&gt;Getting Started: Run Your First Local LLM in 5 Minutes&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;🟢 Beginner&lt;/td&gt;
&lt;td&gt;5 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;02&lt;/td&gt;
&lt;td&gt;&lt;a href="//./hardware-guide-what-do-you-actually-need-to-run-local-llms-1eik"&gt;Hardware Guide: What You Actually Need&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;🟢 Beginner&lt;/td&gt;
&lt;td&gt;8 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;03&lt;/td&gt;
&lt;td&gt;&lt;a href="//./deepseek-r1-the-0-o1-alternative-you-can-run-right-now-24a5"&gt;DeepSeek-R1: The $0 o1 Alternative&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;🟡 Intermediate&lt;/td&gt;
&lt;td&gt;10 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;04&lt;/td&gt;
&lt;td&gt;&lt;a href="//./qwen-36-25-the-most-versatile-local-models-1d8e"&gt;Qwen 3.6 &amp;amp; 2.5: The Most Versatile Local Models&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;🟡 Intermediate&lt;/td&gt;
&lt;td&gt;10 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;05&lt;/td&gt;
&lt;td&gt;&lt;a href="//./open-webui-your-local-chatgpt-29d8"&gt;Open WebUI: Your Local ChatGPT&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;🟡 Intermediate&lt;/td&gt;
&lt;td&gt;8 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;06&lt;/td&gt;
&lt;td&gt;&lt;a href="//./gguf-modelfile-the-power-users-guide-to-local-llms-1fbi"&gt;GGUF &amp;amp; Modelfile: The Power User's Guide&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;🟡 Intermediate&lt;/td&gt;
&lt;td&gt;12 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;07&lt;/td&gt;
&lt;td&gt;&lt;a href="//./local-rag-chat-with-your-documents-open-source-private-390o"&gt;Local RAG: Chat With Your Documents&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;🟡 Intermediate&lt;/td&gt;
&lt;td&gt;10 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;08&lt;/td&gt;
&lt;td&gt;&lt;a href="//./production-ready-local-llms-from-terminal-to-team-deployment-4ph2"&gt;Production-Ready Local LLMs: From Terminal to Team Deployment&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;🔴 Advanced&lt;/td&gt;
&lt;td&gt;15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;09&lt;/td&gt;
&lt;td&gt;&lt;a href="//./function-calling-for-local-llms-deepseek-qwen-glm-4-langchain-4iac"&gt;Function Calling for Local LLMs: DeepSeek, Qwen, GLM-4 &amp;amp; LangChain&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;🔴 Advanced&lt;/td&gt;
&lt;td&gt;15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;👉 Full source code &amp;amp; scripts:&lt;/strong&gt; &lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;GitHub: Lingdas1/local-llm-guide&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🧑‍🏫 New to AI? Start Here
&lt;/h2&gt;

&lt;p&gt;You've heard about AI but feel overwhelmed. You have a regular laptop and want to understand what's possible — in plain English, no jargon.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Guide&lt;/th&gt;
&lt;th&gt;Read Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;01&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/lingdas1/ai-is-too-expensive-i-run-it-for-free-on-my-laptop-1cg"&gt;AI Is Too Expensive? I Run It for Free on My Laptop&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;5 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;02&lt;/td&gt;
&lt;td&gt;&lt;a href="https://dev.to/lingdas1/what-is-an-llm-no-its-not-magic-heres-whats-actually-happening-3ond"&gt;What Is an LLM? (No, It's Not Magic)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;6 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;03&lt;/td&gt;
&lt;td&gt;&lt;em&gt;Step-by-Step: Run Your First AI Model in 10 Minutes&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Coming soon!&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;04&lt;/td&gt;
&lt;td&gt;&lt;em&gt;5 Free Things You Can Do with Local AI&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Coming soon!&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Don't know where to start?&lt;/strong&gt; Begin with Article 01 — it explains why you don't need money or technical skills to use AI.&lt;/p&gt;

&lt;p&gt;All guides are written by a medical student who learned this stuff from zero. No assumed knowledge, no skipped steps.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  📚 The Complete Guide (All-in-One)
&lt;/h2&gt;

&lt;p&gt;If you prefer one long read, start here:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="//./the-complete-guide-to-running-llms-locally-in-2026-from-ollama-to-production-3d8b"&gt;The Complete Guide to Running LLMs Locally in 2026: From Ollama to Production&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Covers everything from installing Ollama to production deployment — all in one article.&lt;/p&gt;




&lt;h2&gt;
  
  
  📊 What This Series Covers
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🟢 Beginner ──────────────────────────────────────┐
    ├── 01. Getting Started (5 min setup)         │
    ├── 02. Hardware Guide (what you need)        │
                                                  │
🟡 Intermediate ──────────────────────────────────┤
    ├── 03. DeepSeek-R1 Guide                     │
    ├── 04. Qwen 3.6 &amp;amp; 2.5 Guide                  │
    ├── 05. Open WebUI Setup                      │
    ├── 06. GGUF &amp;amp; Modelfile Customization        │
    ├── 07. Local RAG with AnythingLLM            │
                                                  │
🔴 Advanced ──────────────────────────────────────┤
    ├── 08. Production Deployment                  │
    ├── 09. Function Calling &amp;amp; Tool Use            │
                                                  │
📦 Bonus: Scripts + Docker Compose + Benchmarks   │
    All on GitHub ⬇️                              │
                                                  │
    GitHub.com/Lingdas1/local-llm-guide ──────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🎯 Why This Series Exists
&lt;/h2&gt;

&lt;p&gt;I started this journey because I was frustrated. Every AI tutorial assumed I had:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unlimited API budget ($200/month for ChatGPT Pro)&lt;/li&gt;
&lt;li&gt;A rack of A100 GPUs&lt;/li&gt;
&lt;li&gt;A CS degree from Stanford&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real life is different.&lt;/strong&gt; I have a laptop, a curious mind, and no budget for API fees. I'm a medical student — not a software engineer.&lt;/p&gt;

&lt;p&gt;If I can figure this out, so can you. That's the whole point of this series.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔗 Quick Links
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What&lt;/th&gt;
&lt;th&gt;Where&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;All source code&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;github.com/Lingdas1/local-llm-guide&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complete guide (one article)&lt;/td&gt;
&lt;td&gt;&lt;a href="//./the-complete-guide-to-running-llms-locally-in-2026-from-ollama-to-production-3d8b"&gt;Here&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Beginner path&lt;/td&gt;
&lt;td&gt;Start at &lt;strong&gt;Article #1&lt;/strong&gt; above&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer path&lt;/td&gt;
&lt;td&gt;Start at &lt;strong&gt;Article #3&lt;/strong&gt; above&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Found this useful?&lt;/td&gt;
&lt;td&gt;⭐ &lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;Star the repo&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;If this guide helped you, consider:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⭐ &lt;strong&gt;Starring the repo&lt;/strong&gt; — it helps others find it and you'll get notified when new chapters drop&lt;/li&gt;
&lt;li&gt;💬 &lt;strong&gt;Leaving a comment&lt;/strong&gt; — I read every one&lt;/li&gt;
&lt;li&gt;🔁 &lt;strong&gt;Sharing with a friend&lt;/strong&gt; who's curious about running AI locally&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Ling — May 2026&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I'm a medical student sharing what I learn about local AI. No CS degree, no big tech — just honest guides for real people.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ollama</category>
      <category>opensource</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Function Calling for Local LLMs: DeepSeek, Qwen, GLM-4 &amp; LangChain</title>
      <dc:creator>Lingdas1</dc:creator>
      <pubDate>Sun, 24 May 2026 09:17:04 +0000</pubDate>
      <link>https://forem.com/lingdas1/function-calling-for-local-llms-deepseek-qwen-glm-4-langchain-4iac</link>
      <guid>https://forem.com/lingdas1/function-calling-for-local-llms-deepseek-qwen-glm-4-langchain-4iac</guid>
      <description>&lt;h1&gt;
  
  
  06 — Function Calling &amp;amp; Tool Use
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;🔴 &lt;strong&gt;Advanced&lt;/strong&gt; — Give your local LLM superpowers: let it call APIs, run code, search the web, and interact with other software — all autonomously.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Is Function Calling? (Plain English First)
&lt;/h2&gt;

&lt;p&gt;Imagine you ask an assistant: &lt;em&gt;"What's the weather in Tokyo right now?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A normal LLM can only guess — it doesn't know today's weather. But with &lt;strong&gt;function calling&lt;/strong&gt;, the LLM can say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I don't know the weather, but I know someone who does. Let me call the weather API."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The pattern is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "What's the weather in Tokyo?"
  ↓
LLM: "I should call get_weather(city='Tokyo')"
  ↓
Your code: calls the actual weather API → gets result
  ↓
LLM: "The weather in Tokyo is 22°C and sunny."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Function calling = the LLM decides when to use a tool, and your code executes it.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Why this matters without the cloud:&lt;/strong&gt; On a cloud API (GPT-4, Claude), function calling is a checkbox feature. &lt;strong&gt;On local LLMs, it's not automatic&lt;/strong&gt; — you need to know which models support it, how to format the tool definitions, and how to handle the response correctly. That's what this chapter covers.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  How Function Calling Works (The Technical Pattern)
&lt;/h2&gt;

&lt;p&gt;Every function calling flow follows the same 5-step cycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step 1: Define your tools (as JSON schema)
Step 2: Send user message + tool definitions to the LLM
Step 3: LLM responds with either:
         - A normal text reply (no tool needed)
         - A "tool call" request (which tool + what arguments)
Step 4: Your code executes the requested tool
Step 5: Send the tool result back to the LLM
         → LLM produces the final response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what a tool definition looks like in JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"function"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"function"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"get_weather"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Get current weather for a city"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"city"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"City name, e.g., 'Tokyo'"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"city"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  1. DeepSeek-R1: Function Calling
&lt;/h2&gt;

&lt;p&gt;DeepSeek-R1 is &lt;strong&gt;excellent&lt;/strong&gt; at function calling — it's one of its standout features. It uses the &lt;strong&gt;OpenAI-compatible format&lt;/strong&gt;, which means you can use the same code you'd use with GPT-4.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic Setup
&lt;/h3&gt;

&lt;p&gt;First, make sure DeepSeek-R1 is running locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull deepseek-r1:14b

&lt;span class="c"&gt;# Or for smaller setups:&lt;/span&gt;
ollama pull deepseek-r1:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Single Tool Call Example (Python)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="c1"&gt;# Step 1: Define the tools available to the LLM
&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get the current weather for a city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;City name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enum&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;celsius&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fahrenheit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calculator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Perform a mathematical calculation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expression&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Math expression, e.g., &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2 + 2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sqrt(144)&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expression&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Send message + tools to the model
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat_with_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-r1:14b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Step 3: Execute tool calls and return results
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute the tool the LLM requested and return the result.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arguments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# In real code, you'd call a real weather API here
&lt;/span&gt;        &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;unit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;celsius&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;22&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;celsius&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;72&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sunny&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;humidity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;65%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calculator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expression&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__builtins__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqrt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;__import__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;math&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;__import__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;math&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;sin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cos&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;__import__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;math&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;__import__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;math&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unknown tool: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Step 4: Run the full interaction
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_with_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant with access to tools.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# First LLM call
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chat_with_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Check if the LLM wants to call tools
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response_message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Second LLM call — now it has the tool results
&lt;/span&gt;        &lt;span class="n"&gt;final_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chat_with_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final_response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response_message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Test it
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_with_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather in Tokyo in celsius?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# → "The weather in Tokyo is 22°C and sunny."
&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_with_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calculate 2^10 + 5*3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# → "The result is 1024 + 15 = 1039."
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Differences from Cloud APIs
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;GPT-4 (Cloud)&lt;/th&gt;
&lt;th&gt;DeepSeek-R1 (Local)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tool_choice&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Supports &lt;code&gt;"auto"&lt;/code&gt;, &lt;code&gt;"required"&lt;/code&gt;, &lt;code&gt;"none"&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Supports &lt;code&gt;"auto"&lt;/code&gt; and &lt;code&gt;"none"&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallel tool calls&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;✅ Yes (multiple tools in one response)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming with tools&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;⚠️ Partially (use &lt;code&gt;stream: false&lt;/code&gt; for reliability)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response format&lt;/td&gt;
&lt;td&gt;OpenAI format&lt;/td&gt;
&lt;td&gt;OpenAI-compatible ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; If DeepSeek-R1 doesn't call tools when you expect it to, try adding explicit instructions in the system prompt like: &lt;em&gt;"You have access to tools. Use them when the user asks for information you don't know."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  2. Qwen 3.6 / 2.5: Function Calling
&lt;/h2&gt;

&lt;p&gt;Qwen models have &lt;strong&gt;native function calling support&lt;/strong&gt; and are particularly good at following complex tool schemas.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Qwen 3.6 (newer, better function calling)&lt;/span&gt;
ollama pull qwen3.6:8b

&lt;span class="c"&gt;# Or Qwen 2.5 (more widely tested)&lt;/span&gt;
ollama pull qwen2.5:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example: Multi-Tool Chatbot
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;qwen_chat_with_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Qwen uses the same OpenAI-compatible format.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3.6:8b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# or "qwen2.5:7b"
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_choice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Lower = more deterministic tool selection
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Define a web search tool (mock)
&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search the internet for current information&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read contents of a file on the local system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Absolute file path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_qwen_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arguments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# In production, use a real search API
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Result about &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()[:&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Limit to 2000 chars
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unknown tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Full interaction loop
&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an AI assistant with access to search and file tools. Use them when needed.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Can you read my config file and tell me what model I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m using?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# First response
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;qwen_chat_with_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Handle tool calls
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_qwen_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c1"&gt;# Get final response
&lt;/span&gt;    &lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;qwen_chat_with_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Qwen-Specific Tips
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tip&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Use &lt;code&gt;temperature: 0.3&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Qwen is more creative by default; lower temp = more reliable tool selection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Describe tools in Chinese + English&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Qwen was trained bilingually; descriptions in English work fine, but Chinese descriptions can improve accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max 5 parallel tools&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Qwen 3.6 supports parallel tool calls but performs best with ≤5 at once&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Use &lt;code&gt;tool_choice: "auto"&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Explicitly setting this prevents the model from ignoring tools&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  3. GLM-4.7: Tool Use &amp;amp; Agents
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GLM-4&lt;/strong&gt; (from Zhipu AI / z.ai) is specifically designed for &lt;strong&gt;agentic workflows&lt;/strong&gt;. It has the strongest tool-use capabilities among Chinese local models — it was trained with tool use as a first-class feature, not an afterthought.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull glm4:9b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GLM's Unique Tool Format
&lt;/h3&gt;

&lt;p&gt;GLM uses a slightly different tool definition format. Note the &lt;strong&gt;&lt;code&gt;required_parameters&lt;/code&gt;&lt;/strong&gt; field instead of &lt;code&gt;required&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="c1"&gt;# GLM tool definition format
&lt;/span&gt;&lt;span class="n"&gt;glm_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Send an email to a recipient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Recipient email address&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subject&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Email subject&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Email body content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required_parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subject&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list_directory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;List files in a directory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Directory path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required_parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;glm_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;glm4:9b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-Step Agent Example
&lt;/h3&gt;

&lt;p&gt;GLM-4 excels at &lt;strong&gt;multi-step reasoning&lt;/strong&gt; — deciding to call tools in sequence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an AI assistant that can use tools. Use them when helpful.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;List the files in /home/user/projects, then tell me which ones are Python files.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# GLM will:
# 1. Call list_directory("/home/user/projects")
# 2. Receive the file list
# 3. Analyze and respond with which are Python files
&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;glm_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;glm_tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_glm_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Your tool execution function
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c1"&gt;# GLM will now synthesize the results
&lt;/span&gt;    &lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;glm_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;glm_tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GLM vs Others: When to Use Each
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Best Model&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple tool call (1-2 tools)&lt;/td&gt;
&lt;td&gt;DeepSeek-R1:7b&lt;/td&gt;
&lt;td&gt;Fastest inference, reliable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex multi-step (3+ tools)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;GLM-4:9b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Best agentic reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Following exact tool schema&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Qwen 3.6:8b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Most accurate parameter extraction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost-sensitive (low VRAM)&lt;/td&gt;
&lt;td&gt;Qwen 2.5:7b&lt;/td&gt;
&lt;td&gt;4.5GB at Q4, works on most GPUs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  4. LangChain Integration
&lt;/h2&gt;

&lt;p&gt;LangChain is the most popular framework for building LLM-powered applications. Here's how to use your local models with function calling in LangChain.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain langchain-community
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Basic LangChain + Ollama Tools
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.chat_models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOllama&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentExecutor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_tool_calling_agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;

&lt;span class="c1"&gt;# Step 1: Define tools using the @tool decorator
&lt;/span&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get current weather for a city. Input: city name.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Replace with real API call
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The weather in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; is 22°C and sunny.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Perform a mathematical calculation. Input: math expression string.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;
    &lt;span class="n"&gt;safe_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqrt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cos&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;e&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;abs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;abs&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__builtins__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}},&lt;/span&gt; &lt;span class="n"&gt;safe_dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the web for current information. Input: search query.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# In production, use DuckDuckGo or similar
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Top result for &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: [Example result]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Create the LLM
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOllama&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen2.5:7b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# or "deepseek-r1:7b", "glm4:9b"
&lt;/span&gt;    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 3: Create the agent
&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calculate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_messages&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful AI assistant with access to tools.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{input}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;placeholder&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{agent_scratchpad}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_tool_calling_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent_executor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Shows you what tools are being called
&lt;/span&gt;    &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Safety limit
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 4: Run it
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather in London and calculate 15% of 200?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;# → "The weather in London is 22°C and sunny. 15% of 200 is 30."
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Running the LangChain Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Save the code above as langchain-agent.py&lt;/span&gt;
python langchain-agent.py

&lt;span class="c"&gt;# You should see:&lt;/span&gt;
&lt;span class="c"&gt;# &amp;gt; Entering new AgentExecutor chain...&lt;/span&gt;
&lt;span class="c"&gt;# &amp;gt; Invoking: get_weather with {'city': 'London'}&lt;/span&gt;
&lt;span class="c"&gt;# &amp;gt; Invoking: calculate with {'expression': '0.15 * 200'}&lt;/span&gt;
&lt;span class="c"&gt;# &amp;gt; The weather in London is 22°C and sunny. 15% of 200 is 30.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Model-Specific LangChain Tips
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;LangChain Model Class&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-R1&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ChatOllama(model="deepseek-r1:14b")&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Best for reasoning-heavy agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.6/2.5&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ChatOllama(model="qwen3.6:8b")&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Most reliable with LangChain's tool format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLM-4&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ChatOllama(model="glm4:9b")&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;May need `stop: ["&amp;lt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  5. Practical: Build a Code Assistant Bot
&lt;/h2&gt;

&lt;p&gt;Let's put it all together — a real tool-using assistant that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read and write files&lt;/li&gt;
&lt;li&gt;Run shell commands&lt;/li&gt;
&lt;li&gt;Search for packages&lt;/li&gt;
&lt;li&gt;Answer questions about your codebase
{% raw %}&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# === Tool Definitions ===
&lt;/span&gt;
&lt;span class="n"&gt;TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read the contents of a file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Absolute path to file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write content to a file (overwrites existing)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_command&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Run a shell command (read-only, safe commands only)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Shell command to run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()[:&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Written &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; bytes to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_command&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Safety: only allow read-only commands
&lt;/span&gt;        &lt;span class="n"&gt;safe_prefixes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grep&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;find&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pwd&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;echo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;which&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;head&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;safe_prefixes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Blocked: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; is not in the allowed command list.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;shell&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(no output)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TimeoutExpired&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Command timed out after 10 seconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# === Main Loop ===
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ollama_host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen2.5:7b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a coding assistant. Use your tools to read files, write code, and run commands.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🤖 Code Assistant (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;) — type &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; to exit&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Tool-call loop (max 5 iterations to prevent infinite loops)
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ollama_host&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;  &lt;span class="c1"&gt;# No more tools needed
&lt;/span&gt;
            &lt;span class="c1"&gt;# Execute each tool
&lt;/span&gt;            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="n"&gt;fn_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;fn_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arguments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  🔧 Calling: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fn_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Print final response
&lt;/span&gt;        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🤖 &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;chat_tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Save and run:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 code-assistant.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example interaction:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: Read my main.py and tell me if there are any bugs
  🔧 Calling: read_file({"path": "./main.py"})
🤖 I can see your main.py. It looks mostly fine, but I notice
   line 42 has a typo: "retrun" should be "return".

You: Fix it
  🔧 Calling: read_file({"path": "./main.py"})
  🔧 Calling: write_file({"path": "./main.py", "content": "..."})
🤖 Fixed! Changed "retrun" to "return" on line 42.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Quick Reference: Model Function Calling Support
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;DeepSeek-R1&lt;/th&gt;
&lt;th&gt;Qwen 3.6 / 2.5&lt;/th&gt;
&lt;th&gt;GLM-4&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI format&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Same &lt;code&gt;tools&lt;/code&gt; parameter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallel calls&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Multiple tools at once&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tool_choice: "auto"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;LLM decides when to use tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tool_choice: "required"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;⚠️ Partial&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Not widely supported locally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming + tools&lt;/td&gt;
&lt;td&gt;⚠️ Partial&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;⚠️ Partial&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;stream: false&lt;/code&gt; to be safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-step reasoning&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Very Good&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Excellent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GLM-4 leads on agentic workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Min VRAM (Q4)&lt;/td&gt;
&lt;td&gt;~4.5 GB (7b)&lt;/td&gt;
&lt;td&gt;~5 GB (8b)&lt;/td&gt;
&lt;td&gt;~5.5 GB (9b)&lt;/td&gt;
&lt;td&gt;All fit on 8GB GPUs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Common Mistakes &amp;amp; Solutions
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mistake&lt;/th&gt;
&lt;th&gt;Symptom&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Wrong model name&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"does not support tools" error&lt;/td&gt;
&lt;td&gt;Verify: &lt;code&gt;curl -s http://localhost:11434/api/tags&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Missing system prompt&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Model never calls tools&lt;/td&gt;
&lt;td&gt;Add: "You have access to tools. Use them when helpful."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Too many tools&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Model calls wrong tool&lt;/td&gt;
&lt;td&gt;Limit to ≤5 tool definitions per call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;No &lt;code&gt;tool_choice: "auto"&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Model ignores tools&lt;/td&gt;
&lt;td&gt;Explicitly set &lt;code&gt;tool_choice: "auto"&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infinite tool loop&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Model keeps calling tools&lt;/td&gt;
&lt;td&gt;Add &lt;code&gt;max_iterations&lt;/code&gt; guard (e.g., 5)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Temperature too high&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tool calls are random/lazy&lt;/td&gt;
&lt;td&gt;Set &lt;code&gt;temperature: 0.3&lt;/code&gt; or lower&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Wrong Ollama port&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Connection refused&lt;/td&gt;
&lt;td&gt;Check: &lt;code&gt;ollama serve&lt;/code&gt; is running on 11434&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;You now have a local LLM that can &lt;strong&gt;see files, run commands, search the web, and execute code&lt;/strong&gt;. This is the foundation for building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI coding assistants&lt;/strong&gt; that read and modify your codebase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal research agents&lt;/strong&gt; that search the web and summarize&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation bots&lt;/strong&gt; that interact with APIs and databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your own AutoGPT&lt;/strong&gt; — a multi-step reasoning agent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; has ready-to-run scripts for all the examples above. Star it to get notified when new chapters drop! ⭐&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? ⭐ &lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;Star the repo&lt;/a&gt; — it helps others find it and you'll get notified when new chapters drop.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>python</category>
      <category>opensource</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Production-Ready Local LLMs: From Terminal to Team Deployment</title>
      <dc:creator>Lingdas1</dc:creator>
      <pubDate>Sun, 24 May 2026 09:13:27 +0000</pubDate>
      <link>https://forem.com/lingdas1/production-ready-local-llms-from-terminal-to-team-deployment-4ph2</link>
      <guid>https://forem.com/lingdas1/production-ready-local-llms-from-terminal-to-team-deployment-4ph2</guid>
      <description>&lt;h1&gt;
  
  
  05 — Production: From Personal Setup to Team Deployment
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;🔴 &lt;strong&gt;Advanced&lt;/strong&gt; — You've got local LLMs running on your machine. Now let's make them available to your team, your apps, and your users — securely and reliably.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What You'll Learn
&lt;/h2&gt;

&lt;p&gt;By the end of this chapter, you'll be able to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Set up &lt;strong&gt;multi-user access&lt;/strong&gt; with Open WebUI (users, groups, permissions)&lt;/li&gt;
&lt;li&gt;✅ Expose Ollama models via &lt;strong&gt;REST API&lt;/strong&gt; with rate limiting and authentication&lt;/li&gt;
&lt;li&gt;✅ Deploy the full stack (Ollama + Open WebUI + RAG) using &lt;strong&gt;Docker&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Monitor &lt;strong&gt;usage, performance, and errors&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Calculate &lt;strong&gt;when local beats the cloud&lt;/strong&gt; — with real 2026 pricing&lt;/li&gt;
&lt;li&gt;✅ Secure your deployment with an &lt;strong&gt;actionable checklist&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  1. Multi-User Management with Open WebUI
&lt;/h2&gt;

&lt;p&gt;Open WebUI isn't just a pretty interface — it comes with a built-in user management system. Here's how to set it up for multiple users.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enabling Sign-Up
&lt;/h3&gt;

&lt;p&gt;By default, Open WebUI allows anyone to create an account. For team use, you'll want to control this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Option A: Allow sign-up (good for small teams)&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:8080 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; open-webui:/app/backend/data &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;OLLAMA_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://host.docker.internal:11434 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;WEBUI_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Team AI"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;ENABLE_SIGNUP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; open-webui &lt;span class="nt"&gt;--restart&lt;/span&gt; always &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/open-webui/open-webui:main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;The &lt;code&gt;--restart always&lt;/code&gt; flag means the container auto-starts when the machine reboots. Without it, your team loses access every time you restart.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Option B: Invite-only (recommended for production)&lt;/span&gt;
&lt;span class="c"&gt;# Same as above, but set:&lt;/span&gt;
&lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;ENABLE_SIGNUP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With sign-up disabled, you create accounts manually from the Admin Panel (&lt;code&gt;Settings → Users → Add User&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating User Roles
&lt;/h3&gt;

&lt;p&gt;Open WebUI supports three roles out of the box:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Permissions&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;User&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Chat with assigned models, create conversations&lt;/td&gt;
&lt;td&gt;Team members who just need to use AI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Admin&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full access: manage users, models, settings&lt;/td&gt;
&lt;td&gt;Team leads, IT admins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pending&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Can't chat yet — awaiting approval&lt;/td&gt;
&lt;td&gt;New sign-ups waiting for review&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;How to assign roles:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Log in as admin → click the gear icon ⚙️ (bottom left)&lt;/li&gt;
&lt;li&gt;Navigate to &lt;strong&gt;Admin Panel → Users&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click the pencil icon on any user → change their role&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Model Access Control (Per-User Models)
&lt;/h3&gt;

&lt;p&gt;This is one of Open WebUI's killer features for production use. You can control &lt;strong&gt;which models each user can see and use&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Admin Panel → &lt;strong&gt;Models&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click a model → &lt;strong&gt;Permissions&lt;/strong&gt; tab&lt;/li&gt;
&lt;li&gt;Select which users/groups can access it&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; You might have a 70B model that only your power users should run (saves VRAM for everyone), or a cheap 7B model for general queries. Per-user model access lets you balance resources intelligently.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. API Deployment: Exposing Ollama to Applications
&lt;/h2&gt;

&lt;p&gt;Ollama runs an HTTP server on &lt;code&gt;http://localhost:11434&lt;/code&gt; by default. To make it accessible to other applications (or other machines on your network), you need to configure it properly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Allow External Connections
&lt;/h3&gt;

&lt;p&gt;By default, Ollama only listens on &lt;code&gt;127.0.0.1&lt;/code&gt; (localhost). To allow network access:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Linux (systemd):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Edit the Ollama service configuration&lt;/span&gt;
&lt;span class="nb"&gt;sudo mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /etc/systemd/system/ollama.service.d
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /etc/systemd/system/ollama.service.d/override.conf &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;'
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Reload and restart&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl daemon-reload
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;0.0.0.0&lt;/code&gt; means "listen on all network interfaces." &lt;code&gt;127.0.0.1&lt;/code&gt; means "listen on localhost only."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;macOS:&lt;/strong&gt; Set the environment variable in your shell profile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add to ~/.zshrc or ~/.bashrc&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OLLAMA_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.0.0.0:11434
&lt;span class="c"&gt;# Then restart Ollama from the menu bar icon&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Docker Ollama:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 11434:11434 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; ollama:/root/.ollama &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; ollama &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--restart&lt;/span&gt; always &lt;span class="se"&gt;\&lt;/span&gt;
  ollama/ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Add Authentication (htpasswd)
&lt;/h3&gt;

&lt;p&gt;⚠️ &lt;strong&gt;Never expose Ollama to the public internet without authentication.&lt;/strong&gt; Anyone who finds your endpoint can run models on your GPU, costing you electricity and compute.&lt;/p&gt;

&lt;p&gt;The simplest auth layer is &lt;strong&gt;Basic Auth&lt;/strong&gt; via a reverse proxy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install nginx and create a password file&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install &lt;/span&gt;nginx apache2-utils
&lt;span class="nb"&gt;sudo &lt;/span&gt;htpasswd &lt;span class="nt"&gt;-c&lt;/span&gt; /etc/nginx/.htpasswd admin
&lt;span class="c"&gt;# Enter a strong password when prompted&lt;/span&gt;

&lt;span class="c"&gt;# Optional: add more users&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;htpasswd /etc/nginx/.htpasswd teammate1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Set Up Nginx Reverse Proxy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="c1"&gt;# /etc/nginx/sites-available/ollama&lt;/span&gt;
&lt;span class="k"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;listen&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;server_name&lt;/span&gt; &lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;# === Authentication ===&lt;/span&gt;
    &lt;span class="c1"&gt;# Every request must provide a valid username:password&lt;/span&gt;
    &lt;span class="kn"&gt;auth_basic&lt;/span&gt; &lt;span class="s"&gt;"Ollama&lt;/span&gt; &lt;span class="s"&gt;API&lt;/span&gt; &lt;span class="s"&gt;—&lt;/span&gt; &lt;span class="s"&gt;Authorized&lt;/span&gt; &lt;span class="s"&gt;Access&lt;/span&gt; &lt;span class="s"&gt;Only"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;auth_basic_user_file&lt;/span&gt; &lt;span class="n"&gt;/etc/nginx/.htpasswd&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;# === Rate Limiting ===&lt;/span&gt;
    &lt;span class="c1"&gt;# Max 30 requests per minute per IP address&lt;/span&gt;
    &lt;span class="kn"&gt;limit_req&lt;/span&gt; &lt;span class="s"&gt;zone=ollama&lt;/span&gt; &lt;span class="s"&gt;burst=5&lt;/span&gt; &lt;span class="s"&gt;nodelay&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;limit_req_status&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://127.0.0.1:11434&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;Host&lt;/span&gt; &lt;span class="nv"&gt;$host&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Real-IP&lt;/span&gt; &lt;span class="nv"&gt;$remote_addr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="c1"&gt;# Increase timeout for long-running model generations&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_read_timeout&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_send_timeout&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Define the rate limit zone (in http block, usually /etc/nginx/nginx.conf)&lt;/span&gt;
&lt;span class="c1"&gt;# limit_req_zone $binary_remote_addr zone=ollama:10m rate=30r/m;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What each directive does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;auth_basic&lt;/code&gt; — Prompts for username/password on every request&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;limit_req&lt;/code&gt; — Prevents a single user from overwhelming your GPU&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;proxy_read_timeout 300&lt;/code&gt; — Models can take minutes to generate; this keeps the connection open
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable the site and restart nginx&lt;/span&gt;
&lt;span class="nb"&gt;sudo ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; /etc/nginx/sites-available/ollama /etc/nginx/sites-enabled/
&lt;span class="nb"&gt;sudo &lt;/span&gt;nginx &lt;span class="nt"&gt;-t&lt;/span&gt;  &lt;span class="c"&gt;# Test config for syntax errors&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Test Your API
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Without auth — should get 401 Unauthorized&lt;/span&gt;
curl &lt;span class="nt"&gt;-sk&lt;/span&gt; http://your-server-ip:8080/api/tags

&lt;span class="c"&gt;# With auth — should work&lt;/span&gt;
curl &lt;span class="nt"&gt;-sk&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; admin:yourpassword http://your-server-ip:8080/api/tags

&lt;span class="c"&gt;# Send a chat request via API&lt;/span&gt;
curl &lt;span class="nt"&gt;-sk&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; admin:yourpassword &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://your-server-ip:8080/api/chat &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "qwen2.5:7b",
    "messages": [{"role": "user", "content": "Say hello in one word"}],
    "stream": false
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;The &lt;code&gt;stream: false&lt;/code&gt; parameter returns the full response at once. For production apps, you'll want &lt;code&gt;stream: true&lt;/code&gt; for streaming responses.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Step 5: OpenAI-Compatible Endpoint
&lt;/h3&gt;

&lt;p&gt;Ollama also exposes an OpenAI-compatible endpoint, which means any tool that works with OpenAI's API can work with your local models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Using the OpenAI Python library with Ollama&lt;/span&gt;
curl &lt;span class="nt"&gt;-sk&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; admin:yourpassword &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://your-server-ip:8080/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "qwen2.5:7b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Compatibility note:&lt;/strong&gt; The OpenAI-compatible endpoint (&lt;code&gt;/v1/&lt;/code&gt;) works with most tools that support OpenAI — including &lt;strong&gt;LangChain, LlamaIndex, Continue.dev (VS Code extension), Cursor, and custom scripts&lt;/strong&gt;. Just change the base URL and skip the API key (or use a dummy one if the tool requires it).&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Docker Deployment: Full Stack, Containerized
&lt;/h2&gt;

&lt;p&gt;This section has two versions. Start with the &lt;strong&gt;quick version&lt;/strong&gt; to get running fast, then use the &lt;strong&gt;deep version&lt;/strong&gt; when you need a proper production setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Version: &lt;code&gt;docker run&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Run each component individually. Good for testing and single-user setups:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Ollama (model server)&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 11434:11434 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; ollama:/root/.ollama &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; ollama &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--restart&lt;/span&gt; always &lt;span class="se"&gt;\&lt;/span&gt;
  ollama/ollama

&lt;span class="c"&gt;# What this does:&lt;/span&gt;
&lt;span class="c"&gt;# - `-d` → run in background (detached mode)&lt;/span&gt;
&lt;span class="c"&gt;# - `-p 11434:11434` → map port 11434 from container to your machine&lt;/span&gt;
&lt;span class="c"&gt;# - `-v ollama:/root/.ollama` → save models to a persistent volume&lt;/span&gt;
&lt;span class="c"&gt;#    (without this, models disappear when the container is recreated)&lt;/span&gt;
&lt;span class="c"&gt;# - `--restart always` → auto-start on boot and after crashes&lt;/span&gt;

&lt;span class="c"&gt;# 2. Open WebUI (chat interface, connects to Ollama)&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:8080 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; open-webui:/app/backend/data &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;OLLAMA_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://ollama:11434 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; open-webui &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--restart&lt;/span&gt; always &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/open-webui/open-webui:main

&lt;span class="c"&gt;# If Ollama and WebUI are on different machines:&lt;/span&gt;
&lt;span class="c"&gt;# -e OLLAMA_BASE_URL=http://YOUR_SERVER_IP:11434&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deep Version: &lt;code&gt;docker-compose&lt;/code&gt; (Production-Ready)
&lt;/h3&gt;

&lt;p&gt;Create a file called &lt;code&gt;docker-compose.yml&lt;/code&gt; in your project directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.8"&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# === Model Server ===&lt;/span&gt;
  &lt;span class="na"&gt;ollama&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama/ollama:latest&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;11434:11434"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama_data:/root/.ollama&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;OLLAMA_HOST=0.0.0.0:11434&lt;/span&gt;
    &lt;span class="c1"&gt;# 🖥️ GPU support (remove this section if you don't have NVIDIA GPU)&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="c1"&gt;# === Web Interface ===&lt;/span&gt;
  &lt;span class="na"&gt;open-webui&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/open-webui/open-webui:main&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;open-webui&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3000:8080"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;open_webui_data:/app/backend/data&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;OLLAMA_BASE_URL=http://ollama:11434&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;WEBUI_NAME=My Local AI&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ENABLE_SIGNUP=false&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;

  &lt;span class="c1"&gt;# === Optional: AnythingLLM (RAG) ===&lt;/span&gt;
  &lt;span class="na"&gt;anythingllm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mintplexlabs/anythingllm:latest&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;anythingllm&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3001:3001"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;anythingllm_data:/app/server/storage&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;LLM_PROVIDER=ollama&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;OLLAMA_BASE_PATH=http://ollama:11434&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;OLLAMA_MODEL_PREF=qwen2.5:7b&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;EMBEDDING_PROVIDER=ollama&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;OLLAMA_EMBEDDING_MODEL=qwen2.5:7b&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ollama_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;open_webui_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;anythingllm_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How to use it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start everything&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; /path/to/your/project
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;

&lt;span class="c"&gt;# Check if containers are running&lt;/span&gt;
docker compose ps

&lt;span class="c"&gt;# View logs&lt;/span&gt;
docker compose logs &lt;span class="nt"&gt;-f&lt;/span&gt; ollama

&lt;span class="c"&gt;# Pull a model&lt;/span&gt;
docker &lt;span class="nb"&gt;exec &lt;/span&gt;ollama ollama pull qwen2.5:7b

&lt;span class="c"&gt;# Stop everything&lt;/span&gt;
docker compose down

&lt;span class="c"&gt;# Update images (when new versions are available)&lt;/span&gt;
docker compose pull
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Line-by-line explanation of the compose file:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;services:&lt;/code&gt; — Each service is a separate container&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ollama:&lt;/code&gt; / &lt;code&gt;open-webui:&lt;/code&gt; — Service names; Open WebUI uses &lt;code&gt;http://ollama:11434&lt;/code&gt; to connect because Docker Compose creates an internal network where service names act as hostnames&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;volumes:&lt;/code&gt; — Persistent storage that survives container recreation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;depends_on:&lt;/code&gt; — Wait for Ollama to start before starting Open WebUI&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;deploy.resources&lt;/code&gt; — GPU passthrough (only works with &lt;code&gt;nvidia-container-toolkit&lt;/code&gt; installed)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Docker Tips for Different Platforms
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Path Mounting&lt;/th&gt;
&lt;th&gt;GPU Support&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Linux&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/home/user/data:/data&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ NVIDIA (&lt;code&gt;nvidia-container-toolkit&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Easiest setup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;macOS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/Documents/data:/data&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;❌ (no GPU passthrough to Docker)&lt;/td&gt;
&lt;td&gt;Models run slower (CPU only in Docker)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Windows&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;C:\data:/data&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ NVIDIA (requires WSL2 backend)&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;\&lt;/code&gt; path separators in Docker&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;macOS users:&lt;/strong&gt; Docker Desktop doesn't support GPU passthrough. For GPU-accelerated Ollama on Mac, &lt;strong&gt;run Ollama natively&lt;/strong&gt; (outside Docker) and point your Dockerized Open WebUI to it via &lt;code&gt;OLLAMA_BASE_URL=http://host.docker.internal:11434&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  4. Monitoring &amp;amp; Logging
&lt;/h2&gt;

&lt;p&gt;You don't need a full observability stack. A few simple checks will tell you most of what you need.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Health Check
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check if Ollama is running&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:11434/api/tags | python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"import sys,json; models=json.load(sys.stdin); print(f'{len(models[&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;models&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;])} models loaded')"&lt;/span&gt;

&lt;span class="c"&gt;# Check GPU usage (NVIDIA only)&lt;/span&gt;
watch &lt;span class="nt"&gt;-n&lt;/span&gt; 2 nvidia-smi

&lt;span class="c"&gt;# Check RAM usage&lt;/span&gt;
free &lt;span class="nt"&gt;-h&lt;/span&gt;

&lt;span class="c"&gt;# Check Docker container status&lt;/span&gt;
docker ps &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s2"&gt;"table {{.Names}}&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;{{.Status}}&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;{{.Ports}}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Basic Logging Setup
&lt;/h3&gt;

&lt;p&gt;Ollama logs to stdout. If you're running it in Docker, logs are already captured:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# See real-time Ollama logs&lt;/span&gt;
docker logs &lt;span class="nt"&gt;-f&lt;/span&gt; ollama

&lt;span class="c"&gt;# See Open WebUI logs&lt;/span&gt;
docker logs &lt;span class="nt"&gt;-f&lt;/span&gt; open-webui

&lt;span class="c"&gt;# Save logs to a file (searchable later)&lt;/span&gt;
docker logs ollama &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ~/ollama-logs-&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%d&lt;span class="si"&gt;)&lt;/span&gt;.txt 2&amp;gt;&amp;amp;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What to Watch For
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Normal Range&lt;/th&gt;
&lt;th&gt;Red Flag&lt;/th&gt;
&lt;th&gt;What To Do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPU VRAM usage&lt;/td&gt;
&lt;td&gt;70–95%&lt;/td&gt;
&lt;td&gt;100% (OOM)&lt;/td&gt;
&lt;td&gt;Use smaller model or lower quantization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU temperature&lt;/td&gt;
&lt;td&gt;65–80°C&lt;/td&gt;
&lt;td&gt;&amp;gt;85°C&lt;/td&gt;
&lt;td&gt;Clean fans, reduce ambient temp, lower power limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response time (7B model)&lt;/td&gt;
&lt;td&gt;1–5 seconds&lt;/td&gt;
&lt;td&gt;&amp;gt;15 seconds&lt;/td&gt;
&lt;td&gt;Check VRAM, restart Ollama, reduce concurrent users&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAM usage&lt;/td&gt;
&lt;td&gt;Within available&lt;/td&gt;
&lt;td&gt;Swap usage &amp;gt;0&lt;/td&gt;
&lt;td&gt;Add more RAM or reduce model count&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Usage Tracking (Who's Using What)
&lt;/h3&gt;

&lt;p&gt;Open WebUI's admin panel provides basic usage statistics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Admin Panel → Chat Logs&lt;/strong&gt; — See conversations (toggle anonymization for privacy)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Admin Panel → Users&lt;/strong&gt; — See active users&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Settings → Models&lt;/strong&gt; — See which models are most popular&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more detailed tracking, add a simple log parser:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Count API requests per hour from Ollama logs&lt;/span&gt;
docker logs ollama 2&amp;gt;&amp;amp;1 | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\[&lt;/span&gt;&lt;span class="s2"&gt;API&lt;/span&gt;&lt;span class="se"&gt;\]&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $1}'&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;: &lt;span class="nt"&gt;-f1&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; | &lt;span class="nb"&gt;uniq&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-rn&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. Cost Analysis: Local vs Cloud (Updated for 2026)
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;💰 &lt;em&gt;All estimates based on US average electricity rate ($0.15/kWh). Hardware prices as of May 2026. Actual costs vary by region.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Full Picture
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Cloud (GPT-4o / Claude)&lt;/th&gt;
&lt;th&gt;Local (Your Hardware)&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Solo heavy user&lt;/strong&gt; ($200/mo API)&lt;/td&gt;
&lt;td&gt;$2,400/year&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$325/year&lt;/strong&gt; (electricity)&lt;/td&gt;
&lt;td&gt;Local after &lt;strong&gt;14 months&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Small team (5 people)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1,000/month ($200×5)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$380/year&lt;/strong&gt; (electricity)&lt;/td&gt;
&lt;td&gt;Local after &lt;strong&gt;~6 months&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Light user&lt;/strong&gt; (&amp;lt;$50/mo API)&lt;/td&gt;
&lt;td&gt;$600/year&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$0&lt;/strong&gt; (existing hardware)&lt;/td&gt;
&lt;td&gt;Local immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Enterprise (50 users)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Custom pricing (~$20K/yr)&lt;/td&gt;
&lt;td&gt;$3,500 (one-time build) + $600/yr&lt;/td&gt;
&lt;td&gt;Local from &lt;strong&gt;month 1&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Detailed Breakdown: Solo Heavy User
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# === Local Setup — One-Time Cost ===&lt;/span&gt;
RTX 4090 &lt;span class="o"&gt;(&lt;/span&gt;or used RTX 3090&lt;span class="o"&gt;)&lt;/span&gt;  &lt;span class="nv"&gt;$1&lt;/span&gt;,500–2,500
Rest of PC &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;if &lt;/span&gt;needed&lt;span class="o"&gt;)&lt;/span&gt;        &lt;span class="nv"&gt;$500&lt;/span&gt;–1,000
Total upfront:                &lt;span class="nv"&gt;$2&lt;/span&gt;,000–3,500

&lt;span class="c"&gt;# === Local — Monthly Costs ===&lt;/span&gt;
Electricity &lt;span class="o"&gt;(&lt;/span&gt;0.4 kWh × 8h/day × &lt;span class="nv"&gt;$0&lt;/span&gt;.15&lt;span class="o"&gt;)&lt;/span&gt;  ~&lt;span class="nv"&gt;$18&lt;/span&gt;/month
Internet &lt;span class="o"&gt;(&lt;/span&gt;negligible&lt;span class="o"&gt;)&lt;/span&gt;                    &lt;span class="nv"&gt;$0&lt;/span&gt;
Total monthly:                           ~&lt;span class="nv"&gt;$18&lt;/span&gt;/month

&lt;span class="c"&gt;# === Cloud — Monthly Costs ===&lt;/span&gt;
ChatGPT Pro / Claude Pro                 &lt;span class="nv"&gt;$200&lt;/span&gt;/month
API calls &lt;span class="o"&gt;(&lt;/span&gt;heavy user&lt;span class="o"&gt;)&lt;/span&gt;                   &lt;span class="nv"&gt;$50&lt;/span&gt;–100/month
Total monthly:                           &lt;span class="nv"&gt;$200&lt;/span&gt;–300/month
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Break-Even Calculator
&lt;/h3&gt;

&lt;p&gt;Here's a simple way to calculate your personal break-even point:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Break-even (months) = Hardware Cost / (Cloud monthly cost - Local monthly cost)

Example with RTX 4090 ($2,500) vs ChatGPT Pro ($200):
$2,500 / ($200 - $18) = 13.7 months
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;After the break-even point, you're &lt;strong&gt;saving $182/month&lt;/strong&gt; compared to the cloud. Over 3 years: &lt;strong&gt;$4,800 in savings&lt;/strong&gt; (minus hardware depreciation).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  When Cloud Still Makes Sense
&lt;/h3&gt;

&lt;p&gt;Let's be honest — local isn't always better:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Situation&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;You need GPT-4o / Claude Opus quality&lt;/td&gt;
&lt;td&gt;Keep a cloud subscription for hard tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Your GPU is &amp;lt;8GB VRAM&lt;/td&gt;
&lt;td&gt;Use local for simple tasks, cloud for complex ones&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You have zero upfront budget&lt;/td&gt;
&lt;td&gt;Start with cloud, save for hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You need 100% uptime (SLA)&lt;/td&gt;
&lt;td&gt;Cloud wins — your home power goes out sometimes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You process huge batches overnight&lt;/td&gt;
&lt;td&gt;Local — no API limits, no per-token cost&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The hybrid approach&lt;/strong&gt; is what I personally recommend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Daily use&lt;/strong&gt; → Local LLM (Qwen 2.5:7b or DeepSeek-R1:14b)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard tasks&lt;/strong&gt; → Cloud API (pay-per-use, ~$20–50/month)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated batch jobs&lt;/strong&gt; → Local (unlimited, no rate limits)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  6. Security Checklist
&lt;/h2&gt;

&lt;p&gt;⚠️ &lt;strong&gt;This is the most important section in this chapter.&lt;/strong&gt; An exposed, unauthenticated Ollama instance is a liability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Before Going Live
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;[ ] &lt;strong&gt;Ollama is NOT directly exposed to the internet&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verify: &lt;code&gt;curl -s http://YOUR_PUBLIC_IP:11434/api/tags&lt;/code&gt; should &lt;strong&gt;fail&lt;/strong&gt; from outside&lt;/li&gt;
&lt;li&gt;If it succeeds → your Ollama is visible to the entire internet!&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;[ ] &lt;strong&gt;Authentication is enabled&lt;/strong&gt; (htpasswd / API key / SSO)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verify: &lt;code&gt;curl -u test:test http://localhost:8080/api/tags&lt;/code&gt; returns 401&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;[ ] &lt;strong&gt;SSH / VPN only for remote access&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Best practice: Don't expose the API at all. Use &lt;strong&gt;Tailscale&lt;/strong&gt; or &lt;strong&gt;WireGuard&lt;/strong&gt; VPN&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;[ ] &lt;strong&gt;Firewall rules are configured&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  &lt;span class="c"&gt;# Allow only local network access (192.168.x.x)&lt;/span&gt;
  &lt;span class="nb"&gt;sudo &lt;/span&gt;ufw allow from 192.168.0.0/16 to any port 11434
  &lt;span class="nb"&gt;sudo &lt;/span&gt;ufw deny 11434  &lt;span class="c"&gt;# Block external access&lt;/span&gt;

  &lt;span class="c"&gt;# If using nginx reverse proxy (port 8080), allow from VPN only&lt;/span&gt;
  &lt;span class="nb"&gt;sudo &lt;/span&gt;ufw allow from 10.0.0.0/8 to any port 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;Ollama version is up to date&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  ollama &lt;span class="nt"&gt;--version&lt;/span&gt;
  &lt;span class="c"&gt;# Compare with https://github.com/ollama/ollama/releases&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;Docker containers restart on failure&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Verify: &lt;code&gt;docker inspect ollama | grep -A2 RestartPolicy&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Security Audit Script
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# save as: security-audit.sh &amp;amp;&amp;amp; chmod +x security-audit.sh&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Local LLM Security Audit ==="&lt;/span&gt;

&lt;span class="c"&gt;# Check 1: Is Ollama exposed?&lt;/span&gt;
&lt;span class="nv"&gt;EXTERNAL_CHECK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  http://localhost:11434/api/tags 2&amp;gt;/dev/null&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$EXTERNAL_CHECK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"200"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"⚠️  Ollama API is accessible (port 11434)"&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"   If this machine has a public IP, the world can run models on your GPU!"&lt;/span&gt;
&lt;span class="k"&gt;else
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Ollama API is not accessible"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Check 2: Is there authentication?&lt;/span&gt;
&lt;span class="nv"&gt;AUTH_CHECK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  http://localhost:8080/api/tags 2&amp;gt;/dev/null&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$AUTH_CHECK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"401"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Reverse proxy auth is working"&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$AUTH_CHECK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"200"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"⚠️  No authentication on port 8080"&lt;/span&gt;
&lt;span class="k"&gt;else
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ℹ️  No reverse proxy detected on port 8080"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Check 3: Firewall status&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;command&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; ufw &amp;amp;&amp;gt;/dev/null&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"--- UFW Status ---"&lt;/span&gt;
  &lt;span class="nb"&gt;sudo &lt;/span&gt;ufw status verbose
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Check 4: GPU usage (snapshot)&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;command&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; nvidia-smi &amp;amp;&amp;gt;/dev/null&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"--- GPU ---"&lt;/span&gt;
  nvidia-smi &lt;span class="nt"&gt;--query-gpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;name,memory.used,memory.total,temperature.gpu &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;csv,noheader
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Run this script periodically or set it up as a cron job: &lt;code&gt;*/30 * * * * /path/to/security-audit.sh&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Your Production Deployment Cheat Sheet
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                 ┌──────────────┐
                 │   Internet   │
                 └──────┬───────┘
                        │
                 ┌──────▼───────┐     ┌──────────────┐
                 │  Nginx Proxy │────▶│ htpasswd Auth │
                 │  (port 8080) │     └──────────────┘
                 └──────┬───────┘
                        │ (internal network only)
                 ┌──────▼───────┐
                 │    Ollama    │
                 │  (port 11434)│
                 └──────┬───────┘
                        │
              ┌─────────┼─────────┐
              │         │         │
       ┌──────▼──┐ ┌───▼────┐ ┌──▼──────┐
       │Open WebUI│ │Anything│ │ Custom  │
       │ (3000)  │ │ LLM    │ │  Apps   │
       └─────────┘ └────────┘ └─────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;You've gone from "running a model in the terminal" to a &lt;strong&gt;production-ready AI server&lt;/strong&gt; that your team can use. &lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;&lt;a href="https://github.com/Lingdas1/local-llm-guide/blob/main/06-function-calling/README.md" rel="noopener noreferrer"&gt;Chapter 6 → Function Calling &amp;amp; Tool Use&lt;/a&gt;&lt;/strong&gt;, we'll make your local LLM actually &lt;strong&gt;do things&lt;/strong&gt; — call APIs, interact with databases, and control other software.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? ⭐ &lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;Star the repo&lt;/a&gt; — it helps others find it and you'll get notified when new chapters drop.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ollama</category>
      <category>production</category>
      <category>devops</category>
    </item>
    <item>
      <title>Getting Started: Run Your First Local LLM in 5 Minutes</title>
      <dc:creator>Lingdas1</dc:creator>
      <pubDate>Sat, 23 May 2026 19:01:20 +0000</pubDate>
      <link>https://forem.com/lingdas1/getting-started-run-your-first-local-llm-in-5-minutes-2i1j</link>
      <guid>https://forem.com/lingdas1/getting-started-run-your-first-local-llm-in-5-minutes-2i1j</guid>
      <description>&lt;h1&gt;
  
  
  01 — Getting Started: Run Your First Local LLM (5 Minutes)
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;🟢 &lt;strong&gt;Beginner&lt;/strong&gt; — No experience needed. Just a computer and 5 minutes.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Is a Local LLM? (Plain English)
&lt;/h2&gt;

&lt;p&gt;An &lt;strong&gt;LLM&lt;/strong&gt; (Large Language Model) is the brain behind ChatGPT, Claude, and Gemini.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;local LLM&lt;/strong&gt; runs that brain on &lt;strong&gt;your own computer&lt;/strong&gt; — not on someone else's server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why does that matter?&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cloud AI (ChatGPT, Claude)&lt;/th&gt;
&lt;th&gt;Local AI (Ollama + models)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;$20–$200/month subscription&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$0&lt;/strong&gt; — completely free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Your data is sent to their servers&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Private&lt;/strong&gt; — everything stays on your machine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Requires internet&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Works offline&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Censored, filtered, rate-limited&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;No limits&lt;/strong&gt; — you control everything&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One-size-fits-all model&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Choose any model&lt;/strong&gt; for any task&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Think of it this way:&lt;/strong&gt; Cloud AI is like renting a car. Local AI is like owning a bicycle. The bicycle is slower, but it's yours, it's free, and nobody can take it away from you.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What You Need
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Minimum requirements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A computer (Windows, macOS, or Linux)&lt;/li&gt;
&lt;li&gt;At least 8 GB of RAM (16 GB recommended)&lt;/li&gt;
&lt;li&gt;A few GB of free disk space&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Nice to have (but not required):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A GPU with 4+ GB VRAM (models run faster, but CPU is fine to start)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;My setup:&lt;/strong&gt; I'm running this on a [your hardware] with [your specs]. If it works for me, it'll work for you.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 1: Install Ollama
&lt;/h2&gt;

&lt;p&gt;Ollama is the easiest way to run local LLMs. Think of it as the "App Store for AI models."&lt;/p&gt;

&lt;h3&gt;
  
  
  macOS
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Linux
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Windows
&lt;/h3&gt;

&lt;p&gt;Download the installer from &lt;a href="https://ollama.com/download" rel="noopener noreferrer"&gt;ollama.com/download&lt;/a&gt; and run it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verify Installation
&lt;/h3&gt;

&lt;p&gt;Open a new terminal and type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama version 0.6.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;🔥 &lt;strong&gt;Pro tip:&lt;/strong&gt; If you get "command not found" on Linux/macOS, restart your terminal or run: &lt;code&gt;export PATH=$PATH:/usr/local/bin&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 2: Pull Your First Model
&lt;/h2&gt;

&lt;p&gt;Now for the fun part — downloading an actual AI brain to run on your computer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen2.5:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This downloads a 4.7 GB model. On a typical internet connection, it takes 2–5 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;While it downloads, here's what's happening:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ollama is downloading a &lt;strong&gt;GGUF file&lt;/strong&gt; (the compressed model format)&lt;/li&gt;
&lt;li&gt;It's auto-detecting your GPU&lt;/li&gt;
&lt;li&gt;It's setting up the inference engine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What if the download is too big?&lt;/strong&gt; Try a smaller model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# For 8 GB RAM laptops — works on almost anything&lt;/span&gt;
ollama pull qwen2.5:1.5b

&lt;span class="c"&gt;# For 4 GB RAM or very old computers&lt;/span&gt;
ollama pull qwen2.5:0.5b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 3: Chat With Your Model
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run qwen2.5:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see a prompt like &lt;code&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/code&gt;. Type something:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; Write a haiku about a cat sitting on a computer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model will think for a moment and then respond. &lt;strong&gt;Congratulations — you just ran an AI on your own hardware!&lt;/strong&gt; 🎉&lt;/p&gt;

&lt;h3&gt;
  
  
  Try These First Commands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; Write a Python function to calculate fibonacci

&amp;gt;&amp;gt;&amp;gt; Explain quantum computing like I'm 10

&amp;gt;&amp;gt;&amp;gt; What's the meaning of life?

&amp;gt;&amp;gt;&amp;gt; /? -- show all available commands

&amp;gt;&amp;gt;&amp;gt; /exit -- quit the chat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Expect it to be slower than ChatGPT.&lt;/strong&gt; That's normal! Local models run at 15–40 tokens per second on a GPU, or 2–6 tok/s on CPU. It's still faster than most people read.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 4: Choose the Right Model for Your Hardware
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Not sure which model to pick? Use this decision tree:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your GPU VRAM?
├── No GPU (CPU only)
│   ├── 32 GB RAM → qwen2.5:7b (slow but works)
│   ├── 16 GB RAM → qwen2.5:1.5b
│   └── 8 GB RAM  → qwen2.5:0.5b
├── 4–6 GB VRAM   → qwen2.5:7b
├── 8–12 GB VRAM  → deepseek-r1:14b (🟢 BEST for most people)
├── 12–16 GB VRAM → deepseek-r1:32b
├── 24 GB VRAM    → qwen3.6:27b or deepseek-r1:32b (Q4)
└── 36+ GB VRAM   → deepseek-r1:70b or qwen2.5:72b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Model Comparison Table
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Ollama Command&lt;/th&gt;
&lt;th&gt;Size (Disk)&lt;/th&gt;
&lt;th&gt;Min RAM&lt;/th&gt;
&lt;th&gt;Min VRAM&lt;/th&gt;
&lt;th&gt;Quality&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5:0.5B&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull qwen2.5:0.5b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.5 GB&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Basic text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5:1.5B&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull qwen2.5:1.5b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.1 GB&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Simple tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen 2.5:7B&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;ollama pull qwen2.5:7b&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4.7 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;🟢 Good start&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5:14B&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull qwen2.5:14b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;9.0 GB&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-R1:14B&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull deepseek-r1:14b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8.2 GB&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;td&gt;🏆 Best value&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-R1:32B&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull deepseek-r1:32b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;18.7 GB&lt;/td&gt;
&lt;td&gt;32 GB&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;Near o1 level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.6:27B&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull qwen3.6:27b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;15 GB&lt;/td&gt;
&lt;td&gt;32 GB&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;Cutting-edge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 4:8B&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull llama4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;4.9 GB&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;td&gt;Good general&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;My recommendation for first-timers:&lt;/strong&gt; Start with &lt;code&gt;qwen2.5:7b&lt;/code&gt;. It runs on almost anything, and it's good enough to be genuinely useful.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What to Do After Your First Chat
&lt;/h2&gt;

&lt;p&gt;You've run your first local LLM. Now what?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next steps in order:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;th&gt;Guide&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Customize your model&lt;/strong&gt; with a Modelfile&lt;/td&gt;
&lt;td&gt;Control temperature, context length, and behavior&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/Lingdas1/local-llm-guide/blob/main/04-advanced-usage/gguf-modelfile.md" rel="noopener noreferrer"&gt;GGUF &amp;amp; Modelfile Guide&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Install Open WebUI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Get a ChatGPT-like web interface instead of the terminal&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/Lingdas1/local-llm-guide/blob/main/04-advanced-usage/open-webui-setup.md" rel="noopener noreferrer"&gt;Open WebUI Setup&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Benchmark your hardware&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;See what speeds your setup can achieve&lt;/td&gt;
&lt;td&gt;Script: &lt;code&gt;./scripts/ollama-benchmark.sh&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Add document search (RAG)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Let your LLM answer questions about your own files&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/Lingdas1/local-llm-guide/blob/main/04-advanced-usage/anythingllm-rag.md" rel="noopener noreferrer"&gt;RAG Guide&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Try a reasoning model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Switch to DeepSeek-R1 for harder problems&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/Lingdas1/local-llm-guide/blob/main/03-models/deepseek-r1.md" rel="noopener noreferrer"&gt;DeepSeek-R1 Guide&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Common First-Timer Problems (And Fixes)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"ollama: command not found"&lt;/td&gt;
&lt;td&gt;Ollama not in PATH&lt;/td&gt;
&lt;td&gt;Restart terminal, or run: &lt;code&gt;export PATH=$PATH:/usr/local/bin&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Download is very slow&lt;/td&gt;
&lt;td&gt;Big file on slow internet&lt;/td&gt;
&lt;td&gt;Try &lt;code&gt;ollama pull qwen2.5:1.5b&lt;/code&gt; instead (much smaller)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model responds very slowly&lt;/td&gt;
&lt;td&gt;Running on CPU&lt;/td&gt;
&lt;td&gt;This is normal! See speed expectations in the table above&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model responds in Chinese&lt;/td&gt;
&lt;td&gt;Default template includes Chinese&lt;/td&gt;
&lt;td&gt;Add &lt;code&gt;SYSTEM "Always respond in English."&lt;/code&gt; to a Modelfile&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"CUDA out of memory"&lt;/td&gt;
&lt;td&gt;Model too big for your GPU&lt;/td&gt;
&lt;td&gt;Use a smaller model or lower quantization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Connection refused"&lt;/td&gt;
&lt;td&gt;Ollama server not running&lt;/td&gt;
&lt;td&gt;Run &lt;code&gt;ollama serve&lt;/code&gt; in a separate terminal first&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Quick Reference: Common Ollama Commands
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List all downloaded models&lt;/span&gt;
ollama list

&lt;span class="c"&gt;# Show currently running models&lt;/span&gt;
ollama ps

&lt;span class="c"&gt;# Delete a model to free space&lt;/span&gt;
ollama &lt;span class="nb"&gt;rm &lt;/span&gt;qwen2.5:7b

&lt;span class="c"&gt;# Update a model to the latest version&lt;/span&gt;
ollama pull qwen2.5:7b

&lt;span class="c"&gt;# Run a model with a one-shot prompt (non-interactive)&lt;/span&gt;
ollama run qwen2.5:7b &lt;span class="s2"&gt;"Write a Python script to download images from a URL"&lt;/span&gt;

&lt;span class="c"&gt;# Use the API (OpenAI compatible)&lt;/span&gt;
curl http://localhost:11434/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model": "qwen2.5:7b", "messages": [{"role": "user", "content": "Hello!"}]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Your First Week Plan
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Day&lt;/th&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Day 1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Install Ollama + pull a model + chat with it&lt;/td&gt;
&lt;td&gt;5 minutes ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Day 2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Try different models (small vs large)&lt;/td&gt;
&lt;td&gt;15 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Day 3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Customize with a Modelfile&lt;/td&gt;
&lt;td&gt;30 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Day 4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Install Open WebUI&lt;/td&gt;
&lt;td&gt;30 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Day 5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ask your LLM to write code or help with real work&lt;/td&gt;
&lt;td&gt;1 hour&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Weekend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Try RAG — let your LLM read your documents&lt;/td&gt;
&lt;td&gt;1 hour&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;blockquote&gt;
&lt;p&gt;🎯 &lt;strong&gt;You've taken the first step.&lt;/strong&gt; Running a local LLM is like learning to ride a bike — wobbly at first, but once you get it, you'll wonder why you didn't start sooner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Found this helpful?&lt;/strong&gt; &lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;⭐ Star the repo&lt;/a&gt; — it helps others find it too.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;— Ling, a medical student who accidentally fell into AI and wants to help you do the same.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ollama</category>
      <category>llm</category>
      <category>opensource</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Hardware Guide: What Do You Actually Need to Run Local LLMs?</title>
      <dc:creator>Lingdas1</dc:creator>
      <pubDate>Sat, 23 May 2026 18:57:27 +0000</pubDate>
      <link>https://forem.com/lingdas1/hardware-guide-what-do-you-actually-need-to-run-local-llms-1eik</link>
      <guid>https://forem.com/lingdas1/hardware-guide-what-do-you-actually-need-to-run-local-llms-1eik</guid>
      <description>&lt;h1&gt;
  
  
  02 — Hardware Guide: What Do You Actually Need?
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;🟢 &lt;strong&gt;Beginner&lt;/strong&gt; — No matter what computer you have, there's a model that will run on it.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Most Important Thing to Know
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;VRAM is the bottleneck, not compute.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A model running on a 5-year-old RTX 3060 at Q4 quantization gives you &lt;strong&gt;96% of the quality&lt;/strong&gt; of the same model on an A100 — just slower. And "slower" for most use cases (chat, coding, document analysis) still means &lt;strong&gt;20-40 tokens per second&lt;/strong&gt;, which is faster than most people read.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Analogy:&lt;/strong&gt; Running AI locally is like cooking at home vs. going to a Michelin-star restaurant. The restaurant (cloud AI) is faster and fancier. But your home cooking (local AI) is free, private, and tastes just as good — just takes a bit longer.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Quick Decision Tree
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What computer do you have?
├── Gaming PC / Workstation with NVIDIA GPU
│   ├── 24 GB VRAM (RTX 4090/5090, RTX 3090) → deepseek-r1:32b or qwen3.6:27b
│   ├── 12-16 GB VRAM (RTX 4070/5070, RTX 4080) → qwen2.5:14b or deepseek-r1:14b  🟢
│   └── 8-12 GB VRAM (RTX 3060/4060) → qwen2.5:7b or deepseek-r1:7b
├── Mac
│   ├── 36 GB+ unified (M4 Max, M3 Max) → qwen3.6:27b or deepseek-r1:32b
│   └── 16 GB unified (M1/M2/M3) → qwen2.5:7b or phi-4:14b
├── AMD GPU
│   ├── 16 GB+ VRAM (RX 7900 XTX) → qwen2.5:14b
│   └── 8-12 GB VRAM (RX 7600/7700) → qwen2.5:7b
├── Intel Arc GPU → qwen2.5:7b (experimental support)
├── CPU only, 32 GB+ RAM → qwen2.5:7b (1-4 tok/s)
├── CPU only, 16 GB RAM → qwen2.5:1.5b (5-10 tok/s)
└── CPU only, 8 GB RAM → qwen2.5:0.5b (10-15 tok/s)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  GPU Comparison Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;VRAM&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;Best Model&lt;/th&gt;
&lt;th&gt;Speed (tok/s)&lt;/th&gt;
&lt;th&gt;Used Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RTX 3060&lt;/strong&gt; 12GB&lt;/td&gt;
&lt;td&gt;12 GB&lt;/td&gt;
&lt;td&gt;Ampere&lt;/td&gt;
&lt;td&gt;Qwen 2.5:14B (Q4)&lt;/td&gt;
&lt;td&gt;25-35&lt;/td&gt;
&lt;td&gt;~$200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RTX 4060&lt;/strong&gt; 8GB&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;td&gt;Ada Lovelace&lt;/td&gt;
&lt;td&gt;Qwen 2.5:7B (Q4)&lt;/td&gt;
&lt;td&gt;35-50&lt;/td&gt;
&lt;td&gt;~$280&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RTX 4060 Ti&lt;/strong&gt; 16GB&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;Ada Lovelace&lt;/td&gt;
&lt;td&gt;DeepSeek-R1:14B (Q4)&lt;/td&gt;
&lt;td&gt;30-45&lt;/td&gt;
&lt;td&gt;~$400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RTX 4070&lt;/strong&gt; 12GB&lt;/td&gt;
&lt;td&gt;12 GB&lt;/td&gt;
&lt;td&gt;Ada Lovelace&lt;/td&gt;
&lt;td&gt;Qwen 2.5:14B (Q4)&lt;/td&gt;
&lt;td&gt;40-55&lt;/td&gt;
&lt;td&gt;~$500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RTX 4070 Ti&lt;/strong&gt; 16GB&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;Ada Lovelace&lt;/td&gt;
&lt;td&gt;DeepSeek-R1:14B (Q4)&lt;/td&gt;
&lt;td&gt;35-50&lt;/td&gt;
&lt;td&gt;~$650&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RTX 4080&lt;/strong&gt; 16GB&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;Ada Lovelace&lt;/td&gt;
&lt;td&gt;DeepSeek-R1:32B (Q4)&lt;/td&gt;
&lt;td&gt;20-30&lt;/td&gt;
&lt;td&gt;~$800&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RTX 4090&lt;/strong&gt; 24GB&lt;/td&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;Ada Lovelace&lt;/td&gt;
&lt;td&gt;DeepSeek-R1:32B (Q3)/Qwen 3.6:27B&lt;/td&gt;
&lt;td&gt;25-35&lt;/td&gt;
&lt;td&gt;~$1,500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RTX 5090&lt;/strong&gt; 32GB&lt;/td&gt;
&lt;td&gt;32 GB&lt;/td&gt;
&lt;td&gt;Blackwell&lt;/td&gt;
&lt;td&gt;DeepSeek-R1:70B (Q3)&lt;/td&gt;
&lt;td&gt;18-25&lt;/td&gt;
&lt;td&gt;~$2,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RTX 3090&lt;/strong&gt; 24GB&lt;/td&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;Ampere&lt;/td&gt;
&lt;td&gt;DeepSeek-R1:32B (Q3)/Qwen 3.6:27B&lt;/td&gt;
&lt;td&gt;15-25&lt;/td&gt;
&lt;td&gt;~$700 🟢&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RTX 3080&lt;/strong&gt; 10/12GB&lt;/td&gt;
&lt;td&gt;10/12 GB&lt;/td&gt;
&lt;td&gt;Ampere&lt;/td&gt;
&lt;td&gt;Qwen 2.5:14B (Q4)&lt;/td&gt;
&lt;td&gt;20-30&lt;/td&gt;
&lt;td&gt;~$350&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RX 7900 XTX&lt;/strong&gt; 24GB&lt;/td&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;RDNA 3&lt;/td&gt;
&lt;td&gt;Qwen 3.6:27B&lt;/td&gt;
&lt;td&gt;20-30&lt;/td&gt;
&lt;td&gt;~$800&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RX 7800 XT&lt;/strong&gt; 16GB&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;RDNA 3&lt;/td&gt;
&lt;td&gt;DeepSeek-R1:14B (Q4)&lt;/td&gt;
&lt;td&gt;25-35&lt;/td&gt;
&lt;td&gt;~$450&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Arc A770&lt;/strong&gt; 16GB&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;Alchemist&lt;/td&gt;
&lt;td&gt;Qwen 2.5:14B (Q4)&lt;/td&gt;
&lt;td&gt;15-25&lt;/td&gt;
&lt;td&gt;~$250&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;🟢 &lt;strong&gt;Best value picks:&lt;/strong&gt; Used RTX 3090 ($700 for 24 GB VRAM) or used RTX 3060 12GB ($200 for a starter).&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Model × VRAM Compatibility Matrix
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Q8 (Full Quality)&lt;/th&gt;
&lt;th&gt;Q6_K&lt;/th&gt;
&lt;th&gt;Q4_K_M 🟢&lt;/th&gt;
&lt;th&gt;Q3_K_M&lt;/th&gt;
&lt;th&gt;Q2_K&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5:0.5B&lt;/td&gt;
&lt;td&gt;0.7 GB&lt;/td&gt;
&lt;td&gt;0.5 GB&lt;/td&gt;
&lt;td&gt;0.4 GB&lt;/td&gt;
&lt;td&gt;0.3 GB&lt;/td&gt;
&lt;td&gt;0.2 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5:1.5B&lt;/td&gt;
&lt;td&gt;1.9 GB&lt;/td&gt;
&lt;td&gt;1.5 GB&lt;/td&gt;
&lt;td&gt;1.1 GB&lt;/td&gt;
&lt;td&gt;0.9 GB&lt;/td&gt;
&lt;td&gt;0.7 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5:7B&lt;/td&gt;
&lt;td&gt;8.1 GB&lt;/td&gt;
&lt;td&gt;6.3 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4.7 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.8 GB&lt;/td&gt;
&lt;td&gt;2.9 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5:14B&lt;/td&gt;
&lt;td&gt;15.5 GB&lt;/td&gt;
&lt;td&gt;12.1 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.0 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7.2 GB&lt;/td&gt;
&lt;td&gt;5.4 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-R1:7B&lt;/td&gt;
&lt;td&gt;8.1 GB&lt;/td&gt;
&lt;td&gt;6.3 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4.7 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.8 GB&lt;/td&gt;
&lt;td&gt;2.9 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-R1:14B&lt;/td&gt;
&lt;td&gt;14.7 GB&lt;/td&gt;
&lt;td&gt;11.2 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8.2 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6.4 GB&lt;/td&gt;
&lt;td&gt;4.9 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-R1:32B&lt;/td&gt;
&lt;td&gt;33.6 GB&lt;/td&gt;
&lt;td&gt;25.4 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;18.7 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;14.5 GB&lt;/td&gt;
&lt;td&gt;10.8 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-R1:70B&lt;/td&gt;
&lt;td&gt;72.0 GB&lt;/td&gt;
&lt;td&gt;55.0 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;40.0 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;31.0 GB&lt;/td&gt;
&lt;td&gt;23.0 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.6:27B&lt;/td&gt;
&lt;td&gt;30.0 GB&lt;/td&gt;
&lt;td&gt;23.0 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;15.0 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;11.5 GB&lt;/td&gt;
&lt;td&gt;8.5 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 4:8B&lt;/td&gt;
&lt;td&gt;9.0 GB&lt;/td&gt;
&lt;td&gt;7.0 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4.9 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.8 GB&lt;/td&gt;
&lt;td&gt;2.8 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-OSS:20B&lt;/td&gt;
&lt;td&gt;22.0 GB&lt;/td&gt;
&lt;td&gt;17.0 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11.5 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8.5 GB&lt;/td&gt;
&lt;td&gt;6.5 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How to read this table:&lt;/strong&gt; Find your VRAM in the first section, then look across the Q4_K_M column to see which models fit. For example, with 12 GB VRAM, Qwen 2.5:14B (Q4_K_M = 9.0 GB) fits comfortably.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Budget Builds
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The "Get Started" Build — $0 (Use What You Have)
&lt;/h3&gt;

&lt;p&gt;If you already have a computer, you can probably run something:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Your Current PC&lt;/th&gt;
&lt;th&gt;Best Free Option&lt;/th&gt;
&lt;th&gt;Can It Be Useful?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Any laptop with 8 GB RAM&lt;/td&gt;
&lt;td&gt;Qwen 2.5:1.5B&lt;/td&gt;
&lt;td&gt;✅ Basic Q&amp;amp;A, simple tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Any laptop with 16 GB RAM&lt;/td&gt;
&lt;td&gt;Qwen 2.5:7B (CPU mode)&lt;/td&gt;
&lt;td&gt;✅ ✅ Writing, brainstorming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Old gaming PC with GTX 1060&lt;/td&gt;
&lt;td&gt;Qwen 2.5:7B (GPU accel)&lt;/td&gt;
&lt;td&gt;✅ ✅ ✅ Coding, summarization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MacBook M1 with 8 GB&lt;/td&gt;
&lt;td&gt;Qwen 2.5:1.5B&lt;/td&gt;
&lt;td&gt;✅ Basic assistance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Cost: $0.&lt;/strong&gt; You already own it. Just install Ollama.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Best Bang for Buck" Build — ~$700
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Used RTX 3090 (24 GB VRAM)  → $700
Rest of PC (keep what you have)
Total: ~$700
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What you can run with this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DeepSeek-R1:32B (Q4_K_M) — o1-level reasoning&lt;/li&gt;
&lt;li&gt;Qwen 3.6:27B — latest cutting-edge&lt;/li&gt;
&lt;li&gt;Any 7B-14B model at full quality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;APIs this replaces:&lt;/strong&gt; ChatGPT Pro ($200/mo) + Claude Pro ($20/mo) = &lt;strong&gt;break-even in ~3 months&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Serious" Build — ~$2,500
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;New RTX 5090 (32 GB VRAM)    → $2,000
64 GB DDR5 RAM                → $200
1 TB NVMe SSD                 → $80
Rest is your existing PC
Total: ~$2,500
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What you can run:&lt;/strong&gt; DeepSeek-R1:70B, Qwen 2.5:72B, multiple models at once&lt;/p&gt;




&lt;h2&gt;
  
  
  CPU-Only Guide
&lt;/h2&gt;

&lt;p&gt;No GPU? No problem. You can still run local LLMs — just slower.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to Expect
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CPU&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;th&gt;Best Model&lt;/th&gt;
&lt;th&gt;Expected Speed&lt;/th&gt;
&lt;th&gt;Readable?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Modern i5/i7 (2020+)&lt;/td&gt;
&lt;td&gt;32 GB&lt;/td&gt;
&lt;td&gt;Qwen 2.5:7B&lt;/td&gt;
&lt;td&gt;2-6 tok/s&lt;/td&gt;
&lt;td&gt;✅ Yes, like reading speed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Modern i5/i7 (2020+)&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;Qwen 2.5:1.5B&lt;/td&gt;
&lt;td&gt;5-10 tok/s&lt;/td&gt;
&lt;td&gt;✅ Comfortable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Older i5 (2017+)&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;Qwen 2.5:1.5B&lt;/td&gt;
&lt;td&gt;3-6 tok/s&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Laptop (any)&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;td&gt;Qwen 2.5:0.5B&lt;/td&gt;
&lt;td&gt;8-15 tok/s&lt;/td&gt;
&lt;td&gt;✅ Fast enough&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;2-6 tok/s&lt;/strong&gt; means a 100-word paragraph takes 15-30 seconds to generate. It's slow by GPU standards but perfectly usable for getting answers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Tips for CPU Users
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use smaller models:&lt;/strong&gt; Qwen 2.5:1.5B is surprisingly capable and runs well on any CPU&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Close other apps:&lt;/strong&gt; Free up RAM for the model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Q2_K quantization:&lt;/strong&gt; Smaller but still useful&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try llama.cpp directly:&lt;/strong&gt; Sometimes faster than Ollama on CPU&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  RAM &amp;amp; VRAM Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How much RAM do you need?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Usage&lt;/th&gt;
&lt;th&gt;Minimum RAM&lt;/th&gt;
&lt;th&gt;Recommended RAM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPU-only with small models (1.5B)&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU-only with medium models (7B)&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;32 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU offloading + OS + browser&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;32 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Running multiple models&lt;/td&gt;
&lt;td&gt;32 GB&lt;/td&gt;
&lt;td&gt;64 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production server&lt;/td&gt;
&lt;td&gt;32 GB&lt;/td&gt;
&lt;td&gt;64 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  How VRAM is actually used
&lt;/h3&gt;

&lt;p&gt;When you run a model, VRAM is consumed by:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Model weights&lt;/strong&gt; (the biggest chunk — see the matrix above)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KV Cache&lt;/strong&gt; (~1 GB per 8K tokens of context)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute buffers&lt;/strong&gt; (~0.5 GB)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Other apps&lt;/strong&gt; (your OS, browser, etc.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; Pick a model whose Q4_K_M size is at least 2 GB less than your total VRAM. The extra headroom handles the KV cache.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mac Users: Special Considerations
&lt;/h2&gt;

&lt;p&gt;Apple Silicon Macs are surprisingly good for local LLMs because of &lt;strong&gt;unified memory&lt;/strong&gt; — the GPU can access all system RAM.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mac Model&lt;/th&gt;
&lt;th&gt;Unified Memory&lt;/th&gt;
&lt;th&gt;Best Model&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;M1 MacBook Air&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;td&gt;Qwen 2.5:1.5B&lt;/td&gt;
&lt;td&gt;Surprising quality from small model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;M1/M2 MacBook Pro&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;Qwen 2.5:7B&lt;/td&gt;
&lt;td&gt;Sweet spot for Mac users&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;M3 Pro/Max&lt;/td&gt;
&lt;td&gt;36 GB&lt;/td&gt;
&lt;td&gt;Qwen 3.6:27B&lt;/td&gt;
&lt;td&gt;Top-tier performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;M4 Max&lt;/td&gt;
&lt;td&gt;48-128 GB&lt;/td&gt;
&lt;td&gt;DeepSeek-R1:70B&lt;/td&gt;
&lt;td&gt;Ultimate local AI machine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mac Studio M4 Ultra&lt;/td&gt;
&lt;td&gt;128-256 GB&lt;/td&gt;
&lt;td&gt;Run anything&lt;/td&gt;
&lt;td&gt;Absolute beast&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; Macs with MLX (Apple's ML framework) can run models faster than Ollama's default backend. Try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama with MLX support&lt;/span&gt;
ollama pull qwen2.5:7b
&lt;span class="c"&gt;# For MLX-native, try mlx-lm instead&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;mlx-lm
mlx_lm.generate &lt;span class="nt"&gt;--model&lt;/span&gt; qwen2.5:7b &lt;span class="nt"&gt;--prompt&lt;/span&gt; &lt;span class="s2"&gt;"Hello"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  AMD &amp;amp; Intel GPU Users
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AMD (ROCm support)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama with ROCm&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | &lt;span class="nv"&gt;OLLAMA_ROCM&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 sh

&lt;span class="c"&gt;# Verify GPU detection&lt;/span&gt;
ollama run qwen2.5:7b
&lt;span class="c"&gt;# Should show "GPU = 1" in startup&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Known quirks:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RX 6000 series works well&lt;/li&gt;
&lt;li&gt;RX 7000 series needs ROCm 6.0+&lt;/li&gt;
&lt;li&gt;Integrated AMD GPUs (like in laptops) are not supported&lt;/li&gt;
&lt;li&gt;Performance is about 80-90% of equivalent NVIDIA&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Intel Arc
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Intel Arc support was added in Ollama 0.22+&lt;/span&gt;
&lt;span class="c"&gt;# Check your version first&lt;/span&gt;
ollama &lt;span class="nt"&gt;--version&lt;/span&gt;

&lt;span class="c"&gt;# If 0.22+, just pull and run&lt;/span&gt;
ollama run qwen2.5:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Known quirks:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Arc A770 16GB is a surprisingly good budget option (~$250 used)&lt;/li&gt;
&lt;li&gt;Arc A580/A750 have limited support&lt;/li&gt;
&lt;li&gt;Expect 60-70% of NVIDIA performance&lt;/li&gt;
&lt;li&gt;Some models may fail on first load (retry usually works)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The "I Just Want to Buy Something" Recommendation
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Budget&lt;/th&gt;
&lt;th&gt;Buy This&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;$200&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Used RTX 3060 12GB&lt;/td&gt;
&lt;td&gt;Best cheap entry point&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;$700&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Used RTX 3090 24GB&lt;/td&gt;
&lt;td&gt;Best value for serious local AI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;$2,000&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;New RTX 5090 32GB&lt;/td&gt;
&lt;td&gt;Best new card for AI (2026)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;$4,000+&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mac Studio M4 Ultra&lt;/td&gt;
&lt;td&gt;If you also do video/audio work&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Quick Reference Card
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check your GPU (Linux)&lt;/span&gt;
nvidia-smi
&lt;span class="c"&gt;# Look for: "Memory-Usage: 4096MiB / 12288MiB" — the second number is your VRAM&lt;/span&gt;

&lt;span class="c"&gt;# Check your GPU (macOS)&lt;/span&gt;
system_profiler SPDisplaysDataType | &lt;span class="nb"&gt;grep &lt;/span&gt;VRAM

&lt;span class="c"&gt;# Check your RAM (Linux)&lt;/span&gt;
free &lt;span class="nt"&gt;-h&lt;/span&gt;

&lt;span class="c"&gt;# Check your RAM (macOS)&lt;/span&gt;
system_profiler SPHardwareDataType | &lt;span class="nb"&gt;grep &lt;/span&gt;Memory

&lt;span class="c"&gt;# See if Ollama detected your GPU&lt;/span&gt;
ollama run qwen2.5:7b &lt;span class="nt"&gt;--verbose&lt;/span&gt; 2&amp;gt;&amp;amp;1 | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; gpu
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Don't overthink hardware. Download Ollama, try &lt;code&gt;qwen2.5:1.5b&lt;/code&gt; or &lt;code&gt;qwen2.5:7b&lt;/code&gt;, and see how it feels. You can always upgrade later.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of the &lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;Local LLM Guide&lt;/a&gt; — the definitive resource for running AI on your own hardware.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>hardware</category>
      <category>llm</category>
      <category>opensource</category>
      <category>guide</category>
    </item>
    <item>
      <title>Open WebUI: Your Local ChatGPT</title>
      <dc:creator>Lingdas1</dc:creator>
      <pubDate>Sat, 23 May 2026 18:55:33 +0000</pubDate>
      <link>https://forem.com/lingdas1/open-webui-your-local-chatgpt-29d8</link>
      <guid>https://forem.com/lingdas1/open-webui-your-local-chatgpt-29d8</guid>
      <description>&lt;h1&gt;
  
  
  Open WebUI: Your Local ChatGPT
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Transform your local LLM into a beautiful, full-featured web interface — like ChatGPT, but running entirely on your machine.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Is Open WebUI?
&lt;/h2&gt;

&lt;p&gt;Open WebUI is a self-hosted web interface for Ollama. It gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🖥️ A ChatGPT-like chat interface in your browser&lt;/li&gt;
&lt;li&gt;🔄 Switch between models mid-conversation&lt;/li&gt;
&lt;li&gt;📁 Upload documents and chat with them (RAG)&lt;/li&gt;
&lt;li&gt;🖼️ Image generation (via Automatic1111 / ComfyUI)&lt;/li&gt;
&lt;li&gt;🎤 Voice input / text-to-speech&lt;/li&gt;
&lt;li&gt;👥 Multi-user support (share with family or team)&lt;/li&gt;
&lt;li&gt;📱 Mobile-friendly (works on phone browsers)&lt;/li&gt;
&lt;li&gt;🔌 Plugins for images, web search, and more&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best of all:&lt;/strong&gt; It connects to your local Ollama instance — no data ever leaves your machine.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;✅ Ollama installed and working (see &lt;a href="https://github.com/Lingdas1/local-llm-guide/blob/main/01-getting-started/" rel="noopener noreferrer"&gt;Getting Started&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;✅ At least one model pulled (e.g., &lt;code&gt;qwen2.5:7b&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;✅ Docker installed (recommended) &lt;strong&gt;OR&lt;/strong&gt; Python 3.11+&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Option A: Install with Docker (Recommended — 2 Minutes)
&lt;/h2&gt;

&lt;p&gt;Docker is the easiest way. One command and you're done:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:8080 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; open-webui:/app/backend/data &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;OLLAMA_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://host.docker.internal:11434 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; open-webui &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--restart&lt;/span&gt; always &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/open-webui/open-webui:main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What this does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;-p 3000:8080&lt;/code&gt; — makes it available at &lt;code&gt;http://localhost:3000&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-v open-webui:/app/backend/data&lt;/code&gt; — keeps your chats saved even if you restart&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-e OLLAMA_BASE_URL&lt;/code&gt; — tells it where your Ollama is running&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--restart always&lt;/code&gt; — auto-starts when your computer boots&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Verify It's Running
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check logs — you should see "Application startup complete"&lt;/span&gt;
docker logs open-webui &lt;span class="nt"&gt;--tail&lt;/span&gt; 20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then open &lt;strong&gt;&lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt;&lt;/strong&gt; in your browser.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;First time?&lt;/strong&gt; Create an account. Don't worry — it's local only. Your data stays on your machine.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Option B: Install with pip (No Docker)
&lt;/h2&gt;

&lt;p&gt;If you don't have Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;open-webui

&lt;span class="c"&gt;# Run&lt;/span&gt;
open-webui serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then open &lt;strong&gt;&lt;a href="http://localhost:8080" rel="noopener noreferrer"&gt;http://localhost:8080&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You'll See
&lt;/h2&gt;

&lt;p&gt;After logging in, Open WebUI looks and feels like ChatGPT:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2jcvpvjxwythh27yew8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2jcvpvjxwythh27yew8.png" alt="Open WebUI Interface" width="799" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key areas:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Chat panel&lt;/strong&gt; (left)&lt;/td&gt;
&lt;td&gt;Your conversation history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Model selector&lt;/strong&gt; (top)&lt;/td&gt;
&lt;td&gt;Switch between all your downloaded models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Chat input&lt;/strong&gt; (bottom)&lt;/td&gt;
&lt;td&gt;Type your message&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Paperclip icon&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Upload documents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Settings gear&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Configure model parameters, RAG, voice&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Cool Things to Try
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Switch Models Mid-Chat
&lt;/h3&gt;

&lt;p&gt;In the top dropdown, you can switch models during a conversation. Each model sees the same chat history.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start with &lt;code&gt;qwen2.5:7b&lt;/code&gt; for general chat&lt;/li&gt;
&lt;li&gt;Switch to &lt;code&gt;deepseek-r1:14b&lt;/code&gt; when you need hard reasoning&lt;/li&gt;
&lt;li&gt;Switch to &lt;code&gt;codellama&lt;/code&gt; for code tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Upload Documents (Built-in RAG)
&lt;/h3&gt;

&lt;p&gt;Click the paperclip icon and upload a PDF, Word doc, or text file. The model can then answer questions about it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upload a research paper and ask questions&lt;/li&gt;
&lt;li&gt;Upload your company's handbook&lt;/li&gt;
&lt;li&gt;Upload a textbook chapter for study help&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Use Voice Input
&lt;/h3&gt;

&lt;p&gt;Click the microphone icon to speak instead of type. This works in Chrome and Edge.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Customize the Model's Behavior
&lt;/h3&gt;

&lt;p&gt;In Settings → Model, you can adjust:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Temperature:&lt;/strong&gt; 0.2 (precise) to 1.0 (creative)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context length:&lt;/strong&gt; How much the model remembers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System prompt:&lt;/strong&gt; The model's persona&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Advanced: Connecting to Other Services
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Image Generation
&lt;/h3&gt;

&lt;p&gt;Open WebUI can integrate with local image generators:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add Automatic1111 (Stable Diffusion)&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 7860:7860 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; sd-models:/models &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpus&lt;/span&gt; all &lt;span class="se"&gt;\&lt;/span&gt;
  asd/stable-diffusion-webui:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then configure in Open WebUI Settings → Image Generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Web Search (Experimental)
&lt;/h3&gt;

&lt;p&gt;Enable web search in Settings → Web Search. Open WebUI will search the internet when answering questions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Production Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  With HTTPS
&lt;/h3&gt;

&lt;p&gt;For secure remote access (behind a VPN or tunnel):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Using Caddy as a reverse proxy&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 443:443 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; open-webui:/app/backend/data &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;OLLAMA_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://ollama:11434 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;WEBUI_SECRET_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-secret-here &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; open-webui &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/open-webui/open-webui:main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-User Setup
&lt;/h3&gt;

&lt;p&gt;Open WebUI supports multiple users out of the box. Each user:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gets their own chat history&lt;/li&gt;
&lt;li&gt;Can't see other users' chats&lt;/li&gt;
&lt;li&gt;Can choose from any model you've pulled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To add users: Go to Settings → Admin Panel → Users → Create User.&lt;/p&gt;




&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Connection refused"&lt;/td&gt;
&lt;td&gt;Ollama not running&lt;/td&gt;
&lt;td&gt;Start Ollama first: &lt;code&gt;ollama serve&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Blank page at localhost:3000&lt;/td&gt;
&lt;td&gt;Container not started&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docker start open-webui&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"No models available"&lt;/td&gt;
&lt;td&gt;No models pulled&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ollama pull qwen2.5:7b&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Slow document Q&amp;amp;A&lt;/td&gt;
&lt;td&gt;Embedding model not loaded&lt;/td&gt;
&lt;td&gt;First doc upload takes extra time to load embeddings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Port 3000 already in use&lt;/td&gt;
&lt;td&gt;Another service using it&lt;/td&gt;
&lt;td&gt;Change port: &lt;code&gt;-p 8080:8080&lt;/code&gt; and use &lt;code&gt;http://localhost:8080&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Container won't start&lt;/td&gt;
&lt;td&gt;Docker not running&lt;/td&gt;
&lt;td&gt;Start Docker Desktop or Docker daemon&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Official docs:&lt;/strong&gt; &lt;a href="https://docs.openwebui.com" rel="noopener noreferrer"&gt;docs.openwebui.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/open-webui/open-webui" rel="noopener noreferrer"&gt;github.com/open-webui/open-webui&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker Hub:&lt;/strong&gt; &lt;a href="https://ghcr.io/open-webui/open-webui" rel="noopener noreferrer"&gt;ghcr.io/open-webui/open-webui&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Next step:&lt;/strong&gt; Now that you have a GUI, try setting up &lt;a href="https://github.com/Lingdas1/local-llm-guide/blob/main/04-advanced-usage/anythingllm-rag.md" rel="noopener noreferrer"&gt;Local RAG&lt;/a&gt; — let your LLM answer questions about your own documents.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part of the &lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;Local LLM Guide&lt;/a&gt; — the definitive resource for running AI on your own hardware.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>openwebui</category>
      <category>llm</category>
      <category>opensource</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Local RAG: Chat With Your Documents (Open Source, Private)</title>
      <dc:creator>Lingdas1</dc:creator>
      <pubDate>Sat, 23 May 2026 18:49:41 +0000</pubDate>
      <link>https://forem.com/lingdas1/local-rag-chat-with-your-documents-open-source-private-390o</link>
      <guid>https://forem.com/lingdas1/local-rag-chat-with-your-documents-open-source-private-390o</guid>
      <description>&lt;h1&gt;
  
  
  Local RAG: Chat With Your Documents
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Upload PDFs, code, research papers, or entire books — then ask your local LLM questions about them. No data ever leaves your machine.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Is RAG? (Plain English)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;RAG&lt;/strong&gt; (Retrieval-Augmented Generation) means your LLM can look up information from your own documents before answering.&lt;/p&gt;

&lt;p&gt;Think of it like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Normal LLM:&lt;/strong&gt; Has a great memory, but only knows what it learned during training&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG:&lt;/strong&gt; The LLM gets a "cheat sheet" — your documents — that it can read before answering&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Analogy:&lt;/strong&gt; Without RAG, the LLM is like a student taking a closed-book exam. With RAG, they get an open-book exam — and you get to write the book.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Real-World Uses
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;What You Upload&lt;/th&gt;
&lt;th&gt;What You Can Ask&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Research&lt;/td&gt;
&lt;td&gt;PDF papers, articles&lt;/td&gt;
&lt;td&gt;"What were the key findings in this study?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Studying&lt;/td&gt;
&lt;td&gt;Textbooks, lecture notes&lt;/td&gt;
&lt;td&gt;"Explain chapter 7 in simpler terms"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Work&lt;/td&gt;
&lt;td&gt;Company docs, reports&lt;/td&gt;
&lt;td&gt;"What's our Q3 strategy?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal&lt;/td&gt;
&lt;td&gt;Contracts, agreements&lt;/td&gt;
&lt;td&gt;"What are the termination clauses?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coding&lt;/td&gt;
&lt;td&gt;Codebase, documentation&lt;/td&gt;
&lt;td&gt;"How does the auth module work?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Personal&lt;/td&gt;
&lt;td&gt;Journals, notes, books&lt;/td&gt;
&lt;td&gt;"What did I write about in March?"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Option A: Built-in RAG in Open WebUI (Simplest)
&lt;/h2&gt;

&lt;p&gt;If you already have &lt;a href="https://github.com/Lingdas1/local-llm-guide/blob/main/04-advanced-usage/open-webui-setup.md" rel="noopener noreferrer"&gt;Open WebUI installed&lt;/a&gt;, RAG is built-in.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Use It
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open &lt;strong&gt;&lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt;&lt;/strong&gt; in your browser&lt;/li&gt;
&lt;li&gt;Click the &lt;strong&gt;paperclip icon&lt;/strong&gt; next to the chat input&lt;/li&gt;
&lt;li&gt;Upload a PDF, .txt, .docx, or .md file&lt;/li&gt;
&lt;li&gt;Wait for the "embedding" process to finish (usually 10-30 seconds)&lt;/li&gt;
&lt;li&gt;Ask questions about the document&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;That's it.&lt;/strong&gt; No configuration needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pro Tips
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multiple documents:&lt;/strong&gt; You can upload several files at once. Open WebUI indexes them all.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model choice:&lt;/strong&gt; Use &lt;code&gt;qwen3.6:27b&lt;/code&gt; or &lt;code&gt;deepseek-r1:14b&lt;/code&gt; for best RAG quality — they have larger context windows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document size:&lt;/strong&gt; Open WebUI handles documents up to hundreds of pages. For very large documents, consider chunking them.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Option B: AnythingLLM (More Powerful)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://anythingllm.com" rel="noopener noreferrer"&gt;AnythingLLM&lt;/a&gt; is a dedicated RAG application with more features than Open WebUI's built-in system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;With Docker (Recommended):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 3001:3001 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; anythingllm:/app/server/storage &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;STORAGE_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/app/server/storage &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; anythingllm &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--restart&lt;/span&gt; always &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/anythingllm/anything-llm:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then open &lt;strong&gt;&lt;a href="http://localhost:3001" rel="noopener noreferrer"&gt;http://localhost:3001&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without Docker:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Download from &lt;a href="https://anythingllm.com" rel="noopener noreferrer"&gt;anythingllm.com&lt;/a&gt; and run the installer for your OS.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Open AnythingLLM&lt;/strong&gt; at &lt;code&gt;http://localhost:3001&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create an admin account&lt;/strong&gt; (local only — no data leaves your machine)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Go to Settings → LLM Provider&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Select Ollama&lt;/strong&gt; from the dropdown&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose your model&lt;/strong&gt; (e.g., &lt;code&gt;qwen2.5:7b&lt;/code&gt; or &lt;code&gt;deepseek-r1:14b&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Click Save&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now set up embeddings:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Go to Settings → Embedding Provider&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Select Ollama&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose an embedding model&lt;/strong&gt; (AnythingLLM will download a small embedding model — about 500 MB)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Click Save&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Uploading Documents
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Click &lt;strong&gt;"New Workspace"&lt;/strong&gt; and give it a name (e.g., "Research Papers")&lt;/li&gt;
&lt;li&gt;Click the &lt;strong&gt;upload icon&lt;/strong&gt; (or drag and drop files)&lt;/li&gt;
&lt;li&gt;Supported formats: PDF, DOCX, TXT, MD, CSV, JSON, code files&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;"Save and Embed"&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Wait for indexing (progress shows in the UI)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Chatting With Your Documents
&lt;/h3&gt;

&lt;p&gt;Once embedded, just type your question:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"What are the three main conclusions from these papers?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AnythingLLM searches your documents for relevant passages and feeds them to the LLM along with your question. The result is &lt;strong&gt;accurate, sourced answers&lt;/strong&gt; — not guesses.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🔥 &lt;strong&gt;Pro tip:&lt;/strong&gt; AnythingLLM shows you which document each answer came from. Hover over the citation to see the exact source passage.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Option C: Manual RAG with LangChain (For Developers)
&lt;/h2&gt;

&lt;p&gt;For maximum control, build RAG with Python and LangChain. This is particularly useful if you want to automate document processing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain langchain-ollama chromadb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Basic RAG Script
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_ollama&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOllama&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OllamaEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DirectoryLoader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TextLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.chains&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RetrievalQA&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Load your documents
&lt;/span&gt;&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DirectoryLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./my-docs/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**/*.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loader_cls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TextLoader&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loaded &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Split into chunks
&lt;/span&gt;&lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Split into &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; chunks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Create embeddings and vector store
&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OllamaEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen2.5:7b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;persist_directory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./chroma_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Create RAG chain
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOllama&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen2.5:7b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;qa_chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RetrievalQA&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_chain_type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chain_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stuff&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_retriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 5. Ask questions
&lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Ask a question (or &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;): &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;qa_chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Answer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Run It
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Put your documents in a folder called "my-docs/"&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; my-docs
&lt;span class="c"&gt;# Copy your PDFs/txts there&lt;/span&gt;

&lt;span class="c"&gt;# Run the script&lt;/span&gt;
python rag.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Choosing the Right RAG Setup
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Open WebUI RAG&lt;/th&gt;
&lt;th&gt;AnythingLLM&lt;/th&gt;
&lt;th&gt;LangChain&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 click&lt;/td&gt;
&lt;td&gt;5 minutes&lt;/td&gt;
&lt;td&gt;30 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Features&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;Advanced&lt;/td&gt;
&lt;td&gt;Full control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Document types&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PDF, TXT, MD&lt;/td&gt;
&lt;td&gt;PDF, DOCX, TXT, MD, CSV, code&lt;/td&gt;
&lt;td&gt;Anything with a loader&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-document&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Citations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ (manual)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Customization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Quick personal use&lt;/td&gt;
&lt;td&gt;Serious knowledge work&lt;/td&gt;
&lt;td&gt;Automation &amp;amp; production&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;My recommendation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Start&lt;/strong&gt; with Open WebUI's built-in RAG (fastest)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Move to&lt;/strong&gt; AnythingLLM when you need citations and multiple workspaces&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use&lt;/strong&gt; LangChain when you need to automate document processing&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Best Practices for Better RAG Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Use the Right Model
&lt;/h3&gt;

&lt;p&gt;RAG works best with models that have &lt;strong&gt;large context windows&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Why It's Good for RAG&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.6:27B&lt;/td&gt;
&lt;td&gt;262K&lt;/td&gt;
&lt;td&gt;Can process entire chapters at once&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 2.5:14B&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;Excellent balance of quality and speed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-R1:14B&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;Best for reasoning about documents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-R1:32B&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;Best overall RAG quality&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2. Write Good Questions
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;❌ Bad Question&lt;/th&gt;
&lt;th&gt;✅ Good Question&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Tell me about it"&lt;/td&gt;
&lt;td&gt;"Summarize the methodology used in section 3"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"What's in this?"&lt;/td&gt;
&lt;td&gt;"What are the three main arguments presented in chapter 2?"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Is this useful?"&lt;/td&gt;
&lt;td&gt;"What evidence does the author provide for their claim on page 15?"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3. Optimize Chunk Size
&lt;/h3&gt;

&lt;p&gt;The chunk size determines how much text the LLM sees at once:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Chunk Size&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;500 chars&lt;/td&gt;
&lt;td&gt;Short lookup questions ("What is X?")&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1000 chars&lt;/td&gt;
&lt;td&gt;General Q&amp;amp;A 🟢 Default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2000 chars&lt;/td&gt;
&lt;td&gt;Summarization tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4000+ chars&lt;/td&gt;
&lt;td&gt;Long-context analysis (Qwen 3.6 recommended)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Common Pitfalls
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"I don't know" to document questions&lt;/td&gt;
&lt;td&gt;Embedding not matching&lt;/td&gt;
&lt;td&gt;Re-save documents in workspace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wrong answers despite having docs&lt;/td&gt;
&lt;td&gt;Chunk size too small&lt;/td&gt;
&lt;td&gt;Increase chunk_size to 2000+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Very slow document processing&lt;/td&gt;
&lt;td&gt;Large files on CPU&lt;/td&gt;
&lt;td&gt;Be patient — first embed takes longest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Model not responding"&lt;/td&gt;
&lt;td&gt;Context overflow&lt;/td&gt;
&lt;td&gt;Use a model with larger context (Qwen 3.6)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can't upload PDFs&lt;/td&gt;
&lt;td&gt;PDF is scanned/image-based&lt;/td&gt;
&lt;td&gt;Use OCR first (tools like marker-pdf)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Set up Open WebUI first&lt;/strong&gt; (it includes RAG out of the box) → &lt;a href="https://github.com/Lingdas1/local-llm-guide/blob/main/04-advanced-usage/open-webui-setup.md" rel="noopener noreferrer"&gt;Open WebUI Guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try it with Chinese models&lt;/strong&gt; → Qwen 3.6 is excellent for RAG due to its 262K context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Combine RAG with Function Calling&lt;/strong&gt; → &lt;a href="https://github.com/Lingdas1/local-llm-guide/tree/main/06-function-calling/" rel="noopener noreferrer"&gt;Chapter 06: Function Calling&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy in production&lt;/strong&gt; → &lt;a href="https://github.com/Lingdas1/local-llm-guide/tree/main/05-production/" rel="noopener noreferrer"&gt;Chapter 05: Production&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Part of the &lt;a href="https://github.com/Lingdas1/local-llm-guide" rel="noopener noreferrer"&gt;Local LLM Guide&lt;/a&gt; — the definitive resource for running AI on your own hardware.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>rag</category>
      <category>llm</category>
      <category>opensource</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
