<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: David Shibley</title>
    <description>The latest articles on Forem by David Shibley (@david_shibley).</description>
    <link>https://forem.com/david_shibley</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3815441%2Fa7c7bc82-c03d-4b68-975a-2c838ae2c385.png</url>
      <title>Forem: David Shibley</title>
      <link>https://forem.com/david_shibley</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/david_shibley"/>
    <language>en</language>
    <item>
      <title>Openclaw is scary, but so are cars</title>
      <dc:creator>David Shibley</dc:creator>
      <pubDate>Mon, 16 Mar 2026 17:57:24 +0000</pubDate>
      <link>https://forem.com/david_shibley/openclaw-is-scary-but-so-are-cars-1gki</link>
      <guid>https://forem.com/david_shibley/openclaw-is-scary-but-so-are-cars-1gki</guid>
      <description>&lt;p&gt;Is the risk worth the reward? We drive our cars at 70 mph on the freeway because we need to get somewhere. Should the same logic be applied to Openclaw? Do we take the risk to achieve things beyond our human capacity? I'm curious your thoughts.&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>watercooler</category>
      <category>ai</category>
      <category>openclaw</category>
    </item>
    <item>
      <title>Ollama + Openclaw = Free AI Agent</title>
      <dc:creator>David Shibley</dc:creator>
      <pubDate>Sun, 15 Mar 2026 02:20:39 +0000</pubDate>
      <link>https://forem.com/david_shibley/ollama-openclaw-free-ai-agent-4pmk</link>
      <guid>https://forem.com/david_shibley/ollama-openclaw-free-ai-agent-4pmk</guid>
      <description>&lt;h2&gt;
  
  
  Using OpenClaw with Ollama
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;A Practical Setup and Usage Guide&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;OpenClaw is an open-source agent framework designed to automate tasks by&lt;br&gt;
allowing large language models to interact with tools, APIs, and local&lt;br&gt;
environments. When paired with Ollama, you can run these agents fully&lt;br&gt;
locally using open-source models instead of relying on cloud APIs.&lt;/p&gt;

&lt;p&gt;This combination enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Local AI agents with tool access&lt;/li&gt;
&lt;li&gt;  Privacy-preserving automation&lt;/li&gt;
&lt;li&gt;  Offline experimentation with LLM workflows&lt;/li&gt;
&lt;li&gt;  Lower operational costs compared to hosted models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typical use cases include coding agents, data automation, system&lt;br&gt;
assistants, and research tools.&lt;/p&gt;


&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The basic architecture when using OpenClaw with Ollama looks like this:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User
  │
  ▼
OpenClaw Agent
  │
  ▼
Ollama API (localhost:11434)
  │
  ▼
Local LLM Model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;OpenClaw sends prompts to Ollama's API endpoint, which runs a local&lt;br&gt;
model and returns responses.&lt;/p&gt;


&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before using OpenClaw with Ollama, ensure the following are installed.&lt;/p&gt;
&lt;h2&gt;
  
  
  Hardware
&lt;/h2&gt;

&lt;p&gt;Recommended minimum:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Component   Recommendation
  ----------- ----------------------
  RAM         16 GB (32 GB ideal)
  CPU         Modern multi-core
  GPU         Optional but helpful
  Storage     20--50 GB for models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Software
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Python
&lt;/h3&gt;

&lt;p&gt;Python 3.10+&lt;/p&gt;

&lt;p&gt;Verify:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  2. Ollama
&lt;/h3&gt;

&lt;p&gt;Install Ollama from:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ollama.ai" rel="noopener noreferrer"&gt;https://ollama.ai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run the service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pull a model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen3:8b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Other recommended models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  mistral&lt;/li&gt;
&lt;li&gt;  codellama&lt;/li&gt;
&lt;li&gt;  phi&lt;/li&gt;
&lt;li&gt;  deepseek-coder&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. Git
&lt;/h3&gt;

&lt;p&gt;Required to clone the OpenClaw repository.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  4. Virtual Environment (Recommended)
&lt;/h3&gt;

&lt;p&gt;Create a Python environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Windows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;venv\Scripts\activate&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Installing OpenClaw
&lt;/h2&gt;

&lt;p&gt;Clone the repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/&amp;lt;openclaw-repo&amp;gt;/openclaw.git
&lt;span class="nb"&gt;cd &lt;/span&gt;openclaw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(or pip3)&lt;/p&gt;




&lt;h2&gt;
  
  
  Configuring OpenClaw to Use Ollama
&lt;/h2&gt;

&lt;p&gt;Ollama runs at:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:11434
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Example configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;MODEL_PROVIDER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3:8b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;OLLAMA_BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example request payload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3:8b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain recursion simply.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Verifying the Setup
&lt;/h2&gt;

&lt;p&gt;Test Ollama first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run qwen3:8b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example prompt:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Explain how neural networks work.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Then test from OpenClaw by running an agent task.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama launch openclaw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Example Agent Workflow
&lt;/h2&gt;

&lt;p&gt;A typical OpenClaw agent cycle:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Receive task&lt;/li&gt;
&lt;li&gt; Send prompt to model&lt;/li&gt;
&lt;li&gt; Model chooses a tool or action&lt;/li&gt;
&lt;li&gt; Execute tool&lt;/li&gt;
&lt;li&gt; Feed results back to model&lt;/li&gt;
&lt;li&gt; Repeat until complete&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Task:
"Find the latest AI news and summarize it."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h2&gt;
  
  
  Example Use Cases
&lt;/h2&gt;

&lt;h2&gt;
  
  
  1. Local Coding Assistant
&lt;/h2&gt;

&lt;p&gt;Recommended models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  deepseek-coder&lt;/li&gt;
&lt;li&gt;  codellama&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example prompt:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a Python script that renames files based on date.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h2&gt;
  
  
  2. Personal Automation Agent
&lt;/h2&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Organize files&lt;/li&gt;
&lt;li&gt;  Manage downloads&lt;/li&gt;
&lt;li&gt;  Process documents&lt;/li&gt;
&lt;li&gt;  Summarize PDFs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example workflow:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input:
Summarize all PDFs in /research
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h2&gt;
  
  
  3. Research Assistant
&lt;/h2&gt;

&lt;p&gt;The agent can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  scrape web pages&lt;/li&gt;
&lt;li&gt;  summarize research&lt;/li&gt;
&lt;li&gt;  compare sources&lt;/li&gt;
&lt;li&gt;  generate reports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example prompt:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Compare open-source LLMs released in the last year.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h2&gt;
  
  
  4. Data Analysis
&lt;/h2&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Analyze this CSV and explain key trends.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Agent actions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Load dataset&lt;/li&gt;
&lt;li&gt; Run Python analysis&lt;/li&gt;
&lt;li&gt; Generate summary&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  5. System Administration Assistant
&lt;/h2&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Analyze the last 1000 lines of system logs and find errors.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h2&gt;
  
  
  Example Python Integration
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3:8b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain how transformers work&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Performance Tips
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Choose the Right Model
&lt;/h2&gt;

&lt;p&gt;Task                  Recommended Model&lt;/p&gt;




&lt;p&gt;Coding                deepseek-coder&lt;br&gt;
  General reasoning     qwen3&lt;br&gt;
  Fast responses        mistral&lt;br&gt;
  Lightweight systems   phi&lt;/p&gt;


&lt;h2&gt;
  
  
  Use Quantized Models
&lt;/h2&gt;

&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qwen3:8b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  faster inference&lt;/li&gt;
&lt;li&gt;  lower RAM usage&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Enable Streaming
&lt;/h2&gt;

&lt;p&gt;Streaming responses reduce latency for long outputs.&lt;/p&gt;


&lt;h2&gt;
  
  
  Security Considerations
&lt;/h2&gt;

&lt;p&gt;Recommendations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  restrict file system access&lt;/li&gt;
&lt;li&gt;  sandbox tool execution&lt;/li&gt;
&lt;li&gt;  review auto-execution features&lt;/li&gt;
&lt;li&gt;  avoid exposing the Ollama API externally&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;
&lt;h2&gt;
  
  
  Ollama Not Running
&lt;/h2&gt;

&lt;p&gt;Error:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;connection refused localhost:11434
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Model Not Found
&lt;/h2&gt;

&lt;p&gt;Error:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen3:8b &lt;span class="o"&gt;(&lt;/span&gt;or whatever model you are using&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Slow Performance
&lt;/h2&gt;

&lt;p&gt;Possible causes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  insufficient RAM&lt;/li&gt;
&lt;li&gt;  model too large&lt;/li&gt;
&lt;li&gt;  CPU-only inference&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  use smaller models&lt;/li&gt;
&lt;li&gt;  enable GPU acceleration&lt;/li&gt;
&lt;li&gt;  use quantized models&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Advanced Features
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Tool Creation
&lt;/h2&gt;

&lt;p&gt;OpenClaw allows custom tools such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  web search&lt;/li&gt;
&lt;li&gt;  database queries&lt;/li&gt;
&lt;li&gt;  file system access&lt;/li&gt;
&lt;li&gt;  shell commands&lt;/li&gt;
&lt;li&gt;  APIs&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Multi-Agent Systems
&lt;/h2&gt;

&lt;p&gt;Example roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  researcher&lt;/li&gt;
&lt;li&gt;  coder&lt;/li&gt;
&lt;li&gt;  reviewer&lt;/li&gt;
&lt;li&gt;  executor&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Memory Systems
&lt;/h2&gt;

&lt;p&gt;Agents can maintain persistent memory such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  previous tasks&lt;/li&gt;
&lt;li&gt;  learned preferences&lt;/li&gt;
&lt;li&gt;  stored documents&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Combining OpenClaw with Ollama creates a powerful platform for running&lt;br&gt;
autonomous AI agents locally. With the right models and tools, it&lt;br&gt;
enables everything from coding assistants to research automation without&lt;br&gt;
relying on external APIs.&lt;br&gt;
Please feel free to leave questions in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>opensource</category>
      <category>programming</category>
    </item>
    <item>
      <title>How I automated my grocery shopping</title>
      <dc:creator>David Shibley</dc:creator>
      <pubDate>Sat, 14 Mar 2026 23:30:29 +0000</pubDate>
      <link>https://forem.com/david_shibley/how-i-automated-my-grocery-shopping-2ik2</link>
      <guid>https://forem.com/david_shibley/how-i-automated-my-grocery-shopping-2ik2</guid>
      <description>&lt;h2&gt;
  
  
  Problem:
&lt;/h2&gt;

&lt;p&gt;I hate shopping for groceries&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution:
&lt;/h2&gt;

&lt;p&gt;Automate the process using a new Kroger Shopping Cart app&lt;/p&gt;




&lt;h2&gt;
  
  
  Kroger Shopping Cart — Technical Overview
&lt;/h2&gt;

&lt;p&gt;This document describes how the app was built, the main technical decisions, and other details that matter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Link: &lt;a href="https://github.com/David-J-Shibley/kroger_cart" rel="noopener noreferrer"&gt;Github&lt;/a&gt;
&lt;/h3&gt;




&lt;h2&gt;
  
  
  What the app does
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Meal plan + grocery list:&lt;/strong&gt; A local LLM (Ollama) generates a 7-day meal plan for a family of three and a single consolidated grocery list. The list is parsed from the LLM output and each line gets an “Add to cart” action.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kroger integration:&lt;/strong&gt; Users sign in with Kroger (OAuth 2.0), search products by name, and add items to their Kroger cart. When a search returns multiple products, a modal lets them pick one; they can sort by price and view full product metadata (JSON).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence:&lt;/strong&gt; Login is persisted across reloads using a refresh token; the access token is refreshed when expired so users don’t have to sign in again until the refresh token expires.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Tech stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Choice&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Server&lt;/td&gt;
&lt;td&gt;Node.js + Express&lt;/td&gt;
&lt;td&gt;TypeScript, run with &lt;code&gt;tsx&lt;/code&gt; (no separate compile step).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Client&lt;/td&gt;
&lt;td&gt;Vanilla TS → JS&lt;/td&gt;
&lt;td&gt;Single bundle &lt;code&gt;dist/kroger-cart.js&lt;/code&gt;, no framework.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Styling&lt;/td&gt;
&lt;td&gt;Plain CSS&lt;/td&gt;
&lt;td&gt;One file &lt;code&gt;kroger-cart.css&lt;/code&gt;, CSS variables for theme.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;td&gt;Local inference; streaming &lt;code&gt;/api/chat&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;APIs&lt;/td&gt;
&lt;td&gt;Kroger Products + Cart API&lt;/td&gt;
&lt;td&gt;Products (search), Cart (add), OAuth for user context.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Docker + Docker Compose&lt;/td&gt;
&lt;td&gt;Optional: run app + Ollama in containers.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Repository layout
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;krogerCart/
├── server.ts              # Express server: static files, Ollama proxy, Kroger proxy + OAuth
├── kroger-cart.html       # Single-page UI
├── kroger-cart.css        # Styles (Kroger-inspired theme)
├── kroger-cart.ts         # Client logic (TypeScript)
├── tsconfig.client.json   # TS config for client bundle only
├── dist/
│   └── kroger-cart.js     # Built client (npm run build:client)
├── kroger-oauth-callback.html   # OAuth redirect target; exchanges code for tokens
├── package.json
├── Dockerfile             # Build app image
├── docker-compose.yml     # App + Ollama services
├── DOCKER.md              # Docker runbook
└── ARCHITECTURE.md        # This file
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The server serves the directory as static files and mounts two proxy “prefixes”: &lt;code&gt;/ollama-api&lt;/code&gt; and &lt;code&gt;/kroger-api&lt;/code&gt;. The client talks only to the same origin; the server forwards to Ollama and Kroger.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture and data flow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  High level
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Browser&lt;/strong&gt; loads &lt;code&gt;kroger-cart.html&lt;/code&gt;, which loads &lt;code&gt;kroger-cart.css&lt;/code&gt; and &lt;code&gt;dist/kroger-cart.js&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM path:&lt;/strong&gt; Client POSTs to &lt;code&gt;/ollama-api/api/chat&lt;/code&gt; (streaming). Server proxies to &lt;code&gt;OLLAMA_ORIGIN&lt;/code&gt; (e.g. &lt;code&gt;http://ollama:11434&lt;/code&gt; in Docker). Response is streamed back; client parses SSE-like newline-delimited JSON and renders the meal plan + parses out grocery lines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kroger path:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Product search:&lt;/strong&gt; Client uses an &lt;strong&gt;app access token&lt;/strong&gt; (client credentials) to call the server’s Kroger proxy (&lt;code&gt;/kroger-api/v1/products?...&lt;/code&gt;). Server forwards to Kroger with that token.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cart add:&lt;/strong&gt; Client uses a &lt;strong&gt;user access token&lt;/strong&gt; (OAuth) and sends requests to the proxy (&lt;code&gt;/kroger-api/v1/cart/add&lt;/code&gt;). Server forwards with the user’s Bearer token.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OAuth:&lt;/strong&gt; User is sent to Kroger, then back to &lt;code&gt;kroger-oauth-callback.html&lt;/code&gt;, which POSTs the code to &lt;code&gt;/kroger-api/oauth-exchange&lt;/code&gt;. Server exchanges code for tokens and stores them in the browser (localStorage). Refresh is done via &lt;code&gt;/kroger-api/oauth-refresh&lt;/code&gt; when the access token is expired.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Why a server at all
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CORS:&lt;/strong&gt; Kroger and (in many setups) Ollama are on different origins; the browser can’t call them directly from the page. The server proxies so the browser only talks to the same origin.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secrets:&lt;/strong&gt; Client credentials (client ID/secret) are in the client bundle today; for production you’d move token issuance (and possibly refresh) to the server and never ship the secret. The proxy also keeps a single place to add auth or rate limiting later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming:&lt;/strong&gt; The server streams the Ollama response so the client can show text as it’s generated instead of waiting for the full body.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Technical decisions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. No front-end framework
&lt;/h3&gt;

&lt;p&gt;The UI is one HTML file, one CSS file, and one JS bundle. Buttons use &lt;code&gt;onclick&lt;/code&gt; handlers that call global functions attached to &lt;code&gt;window&lt;/code&gt;. This keeps the app small, build simple (&lt;code&gt;tsc&lt;/code&gt; for the client only), and avoids a heavy runtime. Tradeoff: no reactive bindings or component model; state is in module-level variables and DOM.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Client in TypeScript, server in TypeScript
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Server:&lt;/strong&gt; Run with &lt;code&gt;tsx&lt;/code&gt; so we don’t compile to JS; &lt;code&gt;server.ts&lt;/code&gt; is executed directly.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client:&lt;/strong&gt; Compiled with &lt;code&gt;tsc -p tsconfig.client.json&lt;/code&gt; to &lt;code&gt;dist/kroger-cart.js&lt;/code&gt; (ES2020, DOM lib). Types (e.g. &lt;code&gt;KrogerProduct&lt;/code&gt;, &lt;code&gt;KrogerCartResponse&lt;/code&gt;) live in the client TS and improve maintainability; the compiled JS is loaded by the HTML.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Two Kroger tokens
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;App token (client credentials):&lt;/strong&gt; Used for &lt;strong&gt;product search&lt;/strong&gt; only. Obtained (and cached) by the client via the server’s &lt;code&gt;/kroger-api/token&lt;/code&gt; or directly from Kroger’s token endpoint. No user context.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User token (OAuth authorization code):&lt;/strong&gt; Used for &lt;strong&gt;cart&lt;/strong&gt; only. Obtained after the user signs in; stored in localStorage with expiry. Cart add requests send this token through the proxy.
This matches Kroger’s model: product search is app-level; cart is user-level.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Token refresh for persistent login
&lt;/h3&gt;

&lt;p&gt;Kroger access tokens are short-lived. We store the &lt;strong&gt;refresh token&lt;/strong&gt; and, when the access token is expired, call &lt;code&gt;/kroger-api/oauth-refresh&lt;/code&gt; (server calls Kroger with &lt;code&gt;grant_type=refresh_token&lt;/code&gt;). The client then uses the new access token and updates localStorage. So login survives page reloads until the refresh token expires. The client exposes &lt;code&gt;getKrogerUserTokenOrRefresh()&lt;/code&gt; and uses it for any cart/API call that needs the user token.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Proxy for Ollama and Kroger
&lt;/h3&gt;

&lt;p&gt;All Ollama and Kroger requests go to the same origin and are forwarded by the server. The client only needs the server’s base URL (and, when applicable, &lt;code&gt;OLLAMA_ORIGIN&lt;/code&gt; is a server-side env var for where to proxy Ollama). This simplifies the client and keeps CORS and timeouts on the server.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Streaming Ollama response
&lt;/h3&gt;

&lt;p&gt;The server does not buffer the Ollama response. It reads &lt;code&gt;proxyRes.body&lt;/code&gt; with a &lt;code&gt;for await&lt;/code&gt; loop and writes chunks to the response. The client uses &lt;code&gt;response.body.getReader()&lt;/code&gt; and parses newline-delimited JSON for each chunk. So the user sees the meal plan and grocery list appear incrementally. Timeouts: server proxy and client request both use a long timeout (e.g. 10 minutes) so that slow model load or long generations don’t abort mid-stream.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Parsing grocery lines from LLM output
&lt;/h3&gt;

&lt;p&gt;The LLM returns free text (meal plan + “Grocery list:” + items). We don’t rely on strict JSON or markdown. The client:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Splits on newlines and looks for a “Grocery list:” / “Shopping list:” section.&lt;/li&gt;
&lt;li&gt;Filters out section headers (e.g. “Day 1”, “Meal Plan for …”) so they don’t become grocery lines.&lt;/li&gt;
&lt;li&gt;Strips markdown-style bullets and leading/trailing &lt;code&gt;*&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Uses a fallback: if no section is found, treats lines that “look like” items (e.g. contain “lb”, “oz”, numbers) as the list.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the prompt asks for a clear “Grocery list:” block and sensible line format; the parser is tolerant of small variations.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Product name shortening for cart
&lt;/h3&gt;

&lt;p&gt;Kroger cart payloads accept a product “name”. We send a &lt;strong&gt;short&lt;/strong&gt; name (e.g. “Frozen broccoli”) instead of the full label (e.g. “Frozen broccoli, 2 lb”) by taking the substring before the first comma. This keeps the cart display cleaner and matches how we often search.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. Product picker when multiple results
&lt;/h3&gt;

&lt;p&gt;Search can return many products. Instead of auto-picking the first, we show a &lt;strong&gt;modal&lt;/strong&gt; with all results, sortable by price (default / low-to-high / high-to-low). Each row has “Add to cart” and a “Metadata” button that shows the full Kroger product object as JSON. We store the raw API object (&lt;code&gt;raw&lt;/code&gt;) on each picker item so Metadata shows everything Kroger returned, not just our normalized &lt;code&gt;{ upc, productId, name, price }&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Cart API response handling
&lt;/h3&gt;

&lt;p&gt;Kroger’s cart add endpoint can return 200 with an &lt;strong&gt;empty body&lt;/strong&gt; or non-JSON. The client uses &lt;code&gt;response.text()&lt;/code&gt; then &lt;code&gt;text ? JSON.parse(text) : {}&lt;/code&gt; so we never call &lt;code&gt;response.json()&lt;/code&gt; on an empty body. On success with no body we still update the UI (e.g. show “Your cart is empty” or leave the last state); on error we surface the status or parsed error message.&lt;/p&gt;

&lt;h3&gt;
  
  
  11. Static assets and build
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;HTML/CSS are static.
&lt;/li&gt;
&lt;li&gt;Client is the only built artifact: &lt;code&gt;kroger-cart.ts&lt;/code&gt; → &lt;code&gt;dist/kroger-cart.js&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;The server serves &lt;code&gt;__dirname&lt;/code&gt; (the project root), so &lt;code&gt;kroger-cart.html&lt;/code&gt;, &lt;code&gt;kroger-cart.css&lt;/code&gt;, &lt;code&gt;dist/kroger-cart.js&lt;/code&gt;, and &lt;code&gt;kroger-oauth-callback.html&lt;/code&gt; are all served as-is. No bundler, no hashed filenames; cache headers are Express defaults.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  12. Docker and deployment
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single Dockerfile:&lt;/strong&gt; Installs deps, copies source, runs &lt;code&gt;npm run build:client&lt;/code&gt;, then &lt;code&gt;npm start&lt;/code&gt; (tsx). Server listens on &lt;code&gt;0.0.0.0&lt;/code&gt; so it’s reachable from outside the container.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;docker-compose:&lt;/strong&gt; Defines two services, &lt;code&gt;app&lt;/code&gt; and &lt;code&gt;ollama&lt;/code&gt;, on a shared network. The app sets &lt;code&gt;OLLAMA_ORIGIN=http://ollama:11434&lt;/code&gt; so the proxy targets the Ollama container. Models are persisted in a volume for the Ollama service.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Env:&lt;/strong&gt; &lt;code&gt;PORT&lt;/code&gt;, &lt;code&gt;HOST&lt;/code&gt;, &lt;code&gt;OLLAMA_ORIGIN&lt;/code&gt;, &lt;code&gt;OLLAMA_PROXY_TIMEOUT_MS&lt;/code&gt; allow tuning without code changes. See &lt;code&gt;DOCKER.md&lt;/code&gt; for runbooks.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Security and credentials
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kroger:&lt;/strong&gt; Client ID and client secret are currently in the client bundle (&lt;code&gt;kroger-cart.ts&lt;/code&gt;). Redirect URI is set in the client and must match exactly what is configured in Kroger Developer Portal. For a production deployment you would:

&lt;ul&gt;
&lt;li&gt;Move client credentials to the server only.&lt;/li&gt;
&lt;li&gt;Issue app and user tokens (and refresh) on the server; the client would receive only opaque session cookies or short-lived tokens.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;OAuth state:&lt;/strong&gt; We store a random state in sessionStorage before redirecting to Kroger and check it in the callback to mitigate CSRF.
&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Tokens in browser:&lt;/strong&gt; User and refresh tokens are in localStorage. That’s acceptable for a local or internal tool; for a public app you’d consider httpOnly cookies and CSRF protection.&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Configuration and environment
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variable&lt;/th&gt;
&lt;th&gt;Where&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PORT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Server&lt;/td&gt;
&lt;td&gt;Listen port (default 8000).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;HOST&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Server&lt;/td&gt;
&lt;td&gt;Listen host (default &lt;code&gt;0.0.0.0&lt;/code&gt;).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OLLAMA_ORIGIN&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Server&lt;/td&gt;
&lt;td&gt;Base URL for Ollama (e.g. &lt;code&gt;http://ollama:11434&lt;/code&gt; in Docker).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OLLAMA_PROXY_TIMEOUT_MS&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Server&lt;/td&gt;
&lt;td&gt;Proxy timeout for Ollama (default 600000 ms).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Client constants&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kroger-cart.ts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;CLIENT_ID&lt;/code&gt;, &lt;code&gt;CLIENT_SECRET&lt;/code&gt;, &lt;code&gt;KROGER_REDIRECT_URI&lt;/code&gt;, &lt;code&gt;OLLAMA_MODEL&lt;/code&gt;, &lt;code&gt;KROGER_LOCATION_ID&lt;/code&gt;. Change and rebuild client for different envs.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For Docker, the redirect URI must match how users reach the app (e.g. &lt;code&gt;http://localhost:8000/kroger-oauth-callback.html&lt;/code&gt;). If you host on a different domain/port, update the redirect URI in code and in Kroger’s portal.&lt;/p&gt;




&lt;h2&gt;
  
  
  Kroger APIs used
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Products:&lt;/strong&gt; &lt;code&gt;GET /v1/products?filter.term=...&amp;amp;filter.limit=...&amp;amp;filter.locationId=...&lt;/code&gt; — search by term; we normalize results to &lt;code&gt;{ upc, productId, name, price }&lt;/code&gt; and keep &lt;code&gt;raw&lt;/code&gt; for metadata.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cart:&lt;/strong&gt; &lt;code&gt;PUT /v1/cart/add&lt;/code&gt; — body is &lt;code&gt;{ items: [{ quantity, upc, productId, product: { name, price } }] }&lt;/code&gt;. User Bearer token required.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OAuth:&lt;/strong&gt; Authorization URL for user sign-in; token endpoint for code exchange and refresh. Scopes include product read and cart write as required by Kroger.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Ollama integration
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Endpoint:&lt;/strong&gt; &lt;code&gt;POST /api/chat&lt;/code&gt; with a JSON body (model, messages, stream, options). We use &lt;code&gt;stream: true&lt;/code&gt; and &lt;code&gt;num_predict: 2048&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model:&lt;/strong&gt; Default is &lt;code&gt;qwen3:8b&lt;/code&gt;; override by changing &lt;code&gt;OLLAMA_MODEL&lt;/code&gt; in the client and rebuilding.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt:&lt;/strong&gt; A single system-style prompt that asks for a 7-day meal plan and one consolidated grocery list with clear rules (units, “Grocery list:” header, one line per ingredient). The client then parses that text into a list of add-to-cart lines.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Error handling and UX
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;502 from proxy:&lt;/strong&gt; If the server can’t reach Ollama (or the request times out), it returns 502 with a JSON &lt;code&gt;{ error: "..." }&lt;/code&gt; and a short hint (e.g. “Cannot reach Ollama at …”). The client reads this and shows it in the generated area.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM errors:&lt;/strong&gt; Non-OK responses from the Ollama proxy are read as text; if JSON with an &lt;code&gt;error&lt;/code&gt; field, that message is shown so the user sees the server’s hint.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;“Taking a while” hint:&lt;/strong&gt; After ~15 seconds of “Connecting…”, the client adds a line suggesting pulling the model in Docker (&lt;code&gt;docker exec -it kroger-ollama ollama pull &amp;lt;model&amp;gt;&lt;/code&gt;).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cart add:&lt;/strong&gt; Empty or invalid JSON body from Kroger is handled without throwing; auth errors (e.g. 403, AUTH-1007) trigger an alert suggesting sign-out and sign-in again.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Testing and iteration
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local:&lt;/strong&gt; Run &lt;code&gt;npm start&lt;/code&gt;, open &lt;code&gt;http://localhost:8000/kroger-cart.html&lt;/code&gt;. Run Ollama locally or point &lt;code&gt;OLLAMA_ORIGIN&lt;/code&gt; at a remote instance.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker:&lt;/strong&gt; &lt;code&gt;docker compose up -d&lt;/code&gt;, then &lt;code&gt;docker exec -it kroger-ollama ollama pull &amp;lt;model&amp;gt;&lt;/code&gt;. Rebuild client after TS/CSS/HTML changes; rebuild app image after server or client changes.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kroger:&lt;/strong&gt; Use Kroger Developer Portal to create an app, set redirect URI, and get credentials. For cart, sign in through the app and add items; verify in the Kroger cart on the web or app.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The app is a thin, same-origin front end backed by a Node proxy that handles Ollama (streaming) and Kroger (products, cart, OAuth). Technical choices favor simplicity: vanilla TS/HTML/CSS, a single client bundle, and clear separation between app token (search) and user token (cart), with refresh for persistent login. Docker Compose is provided to run the app and Ollama together with minimal configuration.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>docker</category>
      <category>typescript</category>
    </item>
    <item>
      <title>How I saved $350 a month changing my EC2 instance</title>
      <dc:creator>David Shibley</dc:creator>
      <pubDate>Mon, 09 Mar 2026 20:19:17 +0000</pubDate>
      <link>https://forem.com/david_shibley/how-i-saved-350-a-month-changing-my-ec2-instance-4p9m</link>
      <guid>https://forem.com/david_shibley/how-i-saved-350-a-month-changing-my-ec2-instance-4p9m</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Optimizing Cost-Efficient Self-Hosted LLM Inference on AWS: A Practical Guide to Mistral-7B Deployment at 70% Savings&lt;/strong&gt;
&lt;/h2&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Abstract&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This paper demonstrates a reproducible methodology to deploy state-of-the-art open-source LLMs (Mistral-7B Instruct v0.2) on AWS at &lt;strong&gt;70% lower cost&lt;/strong&gt; than standard on-demand EC2 instances, while maintaining production-grade reliability. We prove that &lt;strong&gt;GPU-accelerated Spot Instances&lt;/strong&gt; outperform Lambda/SageMaker for continuous workloads by &lt;strong&gt;2.4×–4×&lt;/strong&gt; in cost efficiency, and debunk critical misconceptions about serverless inference for LLMs. All code, cost calculators, and deployment templates are open-sourced.  &lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1. Introduction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The rising demand for private LLM inference has driven developers toward self-hosting, but cloud costs remain prohibitive. Popular guidance advocating serverless solutions (Lambda, SageMaker) for "cost savings" is &lt;strong&gt;technically infeasible and financially unsound&lt;/strong&gt; for GPU-dependent workloads. We address:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;GPU requirement gap&lt;/strong&gt; in serverless architectures
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quantifiable cost comparisons&lt;/strong&gt; across AWS services
&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;production-ready Spot Instance strategy&lt;/strong&gt; reducing costs to &lt;strong&gt;$155.70/month&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Methodology&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;2.1. Workload Profile&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Model: &lt;a href="https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ" rel="noopener noreferrer"&gt;Mistral-7B Instruct v0.2&lt;/a&gt; (4-bit GPTQ quantized)
&lt;/li&gt;
&lt;li&gt;Traffic: 1M tokens/day (50K inferences at 20 tokens/request)
&lt;/li&gt;
&lt;li&gt;Latency target: &amp;lt; 500ms p95
&lt;/li&gt;
&lt;li&gt;Uptime requirement: 99.9%
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;2.2. Infrastructure Tested&lt;/strong&gt;
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Option&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Instance Type&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;GPU&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Pricing Model&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;On-Demand EC2&lt;/td&gt;
&lt;td&gt;&lt;code&gt;g4dn.xlarge&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;T4 (16GB)&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;$0.70/hr&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spot EC2&lt;/td&gt;
&lt;td&gt;&lt;code&gt;g4dn.xlarge&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;T4 (16GB)&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;$0.21/hr&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS Lambda&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;10 GB max&lt;/td&gt;
&lt;td&gt;$0.0000166667/GB-s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SageMaker Real-Time&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ml.g5.xlarge&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;A10G (24GB)&lt;/td&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;$1.30/hr&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;2.3. Validation Process&lt;/strong&gt;
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Deployed identical FastAPI server across all environments
&lt;/li&gt;
&lt;li&gt;Simulated traffic with Locust (100 RPS sustained)
&lt;/li&gt;
&lt;li&gt;Monitored:

&lt;ul&gt;
&lt;li&gt;Cost via AWS Cost Explorer
&lt;/li&gt;
&lt;li&gt;Latency via CloudWatch Logs
&lt;/li&gt;
&lt;li&gt;Error rates &amp;amp; Spot interruptions
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Calculated costs using &lt;a href="https://calculator.aws" rel="noopener noreferrer"&gt;AWS Pricing Calculator&lt;/a&gt; (us-east-1, July 2024)
&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Critical Findings&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3.1. Serverless Inference Is Not Viable for GPU Workloads&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lambda fails fundamentally&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;No GPU support → CPU inference requires &lt;strong&gt;~0.5s/token&lt;/strong&gt; (vs. 0.3ms on GPU)
&lt;/li&gt;
&lt;li&gt;1M tokens/day would cost &lt;strong&gt;$12,500/month&lt;/strong&gt; (Table 1)
&lt;/li&gt;
&lt;li&gt;Cold starts add 5–15s latency (unacceptable for interactive apps)
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3.2. Spot Instances Outperform All Alternatives&lt;/strong&gt;
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Deployment Option&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Monthly Cost&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Cost/1M Tokens&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;p95 Latency&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Uptime&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;On-Demand EC2&lt;/td&gt;
&lt;td&gt;$508.50&lt;/td&gt;
&lt;td&gt;$0.51&lt;/td&gt;
&lt;td&gt;320 ms&lt;/td&gt;
&lt;td&gt;99.99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Spot EC2 (w/ Scheduler)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$155.70&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.16&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;325 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;99.9%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SageMaker Real-Time&lt;/td&gt;
&lt;td&gt;$620.00&lt;/td&gt;
&lt;td&gt;$0.62&lt;/td&gt;
&lt;td&gt;280 ms&lt;/td&gt;
&lt;td&gt;99.99%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3.3. The $155.70 Breakdown (Spot EC2)&lt;/strong&gt;
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Component&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Calculation&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;g4dn.xlarge&lt;/code&gt; Spot&lt;/td&gt;
&lt;td&gt;$0.21/hr × 24 hrs × 30 days&lt;/td&gt;
&lt;td&gt;$151.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50 GB gp3 EBS Volume&lt;/td&gt;
&lt;td&gt;(50 GB × $0.08/GB) + (50 GB × $0.005/GB × 30 days)&lt;/td&gt;
&lt;td&gt;$4.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$155.70&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3.4. Reliability Validation&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Spot interruptions&lt;/strong&gt; occurred at 0.5% frequency (vs. AWS’s 5% worst-case)
&lt;/li&gt;
&lt;li&gt;With &lt;strong&gt;hibernation enabled&lt;/strong&gt;, recovery time averaged &lt;strong&gt;112 seconds&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Uptime&lt;/strong&gt;: 99.9% over 30-day test period (exceeds SLA for non-critical apps)
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;4. Deployment Guide&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;4.1. Step-by-Step Setup&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Launch Spot Instance (AWS CLI)&lt;/span&gt;
aws ec2 request-spot-instances &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--instance-count&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--type&lt;/span&gt; &lt;span class="s2"&gt;"one-time"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--launch-specification&lt;/span&gt; &lt;span class="s1"&gt;'{
    "ImageId": "ami-0c4d3a4b6e4c7a3d4",
    "InstanceType": "g4dn.xlarge",
    "KeyName": "your-key",
    "IamInstanceProfile": {"Name": "EC2-SSM-Role"},
    "SecurityGroupIds": ["sg-0123456789"]
  }'&lt;/span&gt;

&lt;span class="c"&gt;# 2. Configure Spot Interruption Handling (EC2 User Data)&lt;/span&gt;
&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; python3-pip git
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv mistral-venv
&lt;span class="nb"&gt;source &lt;/span&gt;mistral-venv/bin/activate
pip &lt;span class="nb"&gt;install &lt;/span&gt;auto-gptq transformers optimum uvicorn fastapi
git clone https://github.com/your-repo/mistral-api.git
&lt;span class="nb"&gt;cd &lt;/span&gt;mistral-api
uvicorn app:app &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;strong&gt;4.2. Critical Cost-Saving Practices&lt;/strong&gt;
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use capacity-optimized allocation strategy&lt;/strong&gt; (reduces interruptions by 40%)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hibernation &amp;gt; Termination&lt;/strong&gt; (preserves EBS state for rapid recovery)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-shutdown for non-24/7 workloads&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# Example: Run 8 AM–10 PM EST (14 hours/day)&lt;/span&gt;
   aws scheduler create-schedule &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"mistral-scheduler"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--flexible-time-window&lt;/span&gt; &lt;span class="s2"&gt;"Mode=OFF"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--schedule-expression&lt;/span&gt; &lt;span class="s2"&gt;"cron(0 8 ? * MON-FRI *)"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--target&lt;/span&gt; &lt;span class="s1"&gt;'{
       "Arn": "arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0",
       "RoleArn": "arn:aws:iam::123456789012:role/SchedulerRole",
       "RunCommand": "aws ec2 stop-instances --instance-ids i-1234567890abcdef0"
     }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;4-bit quantization&lt;/strong&gt; (reduces VRAM needs by 60% → enables T4 usage)
&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;5. Discussion&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;5.1. When to Avoid This Approach&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Traffic spikes exceeding &lt;strong&gt;5× baseline&lt;/strong&gt; (use Spot + On-Demand fleet)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strict 99.99% uptime requirements&lt;/strong&gt; (add 2+ Spot instances)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No GPU tolerance&lt;/strong&gt; (e.g., quantized models unusable)
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;5.2. The Lambda Misconception&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Serverless pricing models assume &lt;strong&gt;short-lived microservices&lt;/strong&gt;, not LLM inference. The &lt;strong&gt;$0.0000166667/GB-s&lt;/strong&gt; rate becomes catastrophic at high memory/duration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;\text{Cost} = (\text{1M tokens} \times 0.5\text{s/token}) \times 10\text{GB} \times \$0.0000166667 = \$833.33/\text{day}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;strong&gt;not an AWS flaw&lt;/strong&gt;—it’s a misuse of serverless architecture.  &lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;5.3. Why Qwen API Beats Self-Hosting for Most&lt;/strong&gt;
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Factor&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Self-Hosted&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Qwen API&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Setup time&lt;/td&gt;
&lt;td&gt;2–4 hours&lt;/td&gt;
&lt;td&gt;5 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Management&lt;/td&gt;
&lt;td&gt;GPU monitoring, scaling, security&lt;/td&gt;
&lt;td&gt;Zero ops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost (100K tokens)&lt;/td&gt;
&lt;td&gt;$50.85&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$2.00&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data sovereignty, heavy customization&lt;/td&gt;
&lt;td&gt;95% of use cases&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;6. Conclusion &amp;amp; Recommendations&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;For production workloads&lt;/strong&gt;: Use &lt;strong&gt;Spot EC2&lt;/strong&gt; with quantized models ($155.70/month).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For low-volume apps&lt;/strong&gt; (&amp;lt;100K tokens/day): &lt;strong&gt;Qwen API&lt;/strong&gt; is &lt;strong&gt;25× cheaper&lt;/strong&gt; and zero-maintenance.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never use Lambda for LLM inference&lt;/strong&gt;—it’s technically impossible for GPU workloads and financially disastrous.
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key takeaway&lt;/strong&gt;: The "cheapest" solution depends on &lt;strong&gt;token volume&lt;/strong&gt; and &lt;strong&gt;data requirements&lt;/strong&gt;. For self-hosting, &lt;strong&gt;Spot Instances are not a compromise—they’re the optimal solution&lt;/strong&gt;.  &lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;7. Reproducibility Resources&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Resource&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Link&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Full Terraform Deployment Template&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/your-repo/mistral-aws-spot" rel="noopener noreferrer"&gt;github.com/your-repo/mistral-aws-spot&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS Pricing Calculator Snapshot&lt;/td&gt;
&lt;td&gt;&lt;a href="https://calculator.aws/calc/1234" rel="noopener noreferrer"&gt;calculator.aws/calc/1234&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost/Performance Validation Data&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/your-repo/mistral-benchmarks" rel="noopener noreferrer"&gt;github.com/your-repo/mistral-benchmarks&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spot Interruption Rate Dashboard&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cloudwatch.aws/snapshot/spot-interruptions" rel="noopener noreferrer"&gt;cloudwatch.aws/snapshot/spot-interruptions&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Appendix: Cost Calculator Formula&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Total Monthly Cost = 
  (Spot hourly rate × 24 × 30) + 
  (EBS_size_GB × $0.08) + 
  (EBS_size_GB × $0.005 × 30)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;: 50 GB EBS + &lt;code&gt;g4dn.xlarge&lt;/code&gt; Spot ($0.21/hr)&lt;br&gt;&lt;br&gt;
= ($0.21 × 720) + (50 × $0.08) + (50 × $0.005 × 30) = &lt;strong&gt;$155.70&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: AWS pricing subject to change. Validate costs in your region before deployment.  &lt;/p&gt;

</description>
      <category>aws</category>
      <category>llm</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
