<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Liam Steiner</title>
    <description>The latest articles on Forem by Liam Steiner (@sliamh11).</description>
    <link>https://forem.com/sliamh11</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3869443%2F6c3565e3-0bdb-451b-9e52-a4daae110aae.png</url>
      <title>Forem: Liam Steiner</title>
      <link>https://forem.com/sliamh11</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sliamh11"/>
    <language>en</language>
    <item>
      <title>I added a local eval loop to my personal AI assistant — here's what 800 scored interactions taught me</title>
      <dc:creator>Liam Steiner</dc:creator>
      <pubDate>Tue, 14 Apr 2026 07:22:50 +0000</pubDate>
      <link>https://forem.com/sliamh11/i-added-a-local-eval-loop-to-my-personal-ai-assistant-heres-what-800-scored-interactions-taught-1knm</link>
      <guid>https://forem.com/sliamh11/i-added-a-local-eval-loop-to-my-personal-ai-assistant-heres-what-800-scored-interactions-taught-1knm</guid>
      <description>&lt;p&gt;I'd been using my self-hosted assistant daily for a few months. Long enough to have a sense that some interactions were useful and some weren't. Not long enough to do anything about it.&lt;/p&gt;

&lt;p&gt;The problem: no feedback mechanism. I could tell a bad response when I saw it, but there was no signal that accumulated.&lt;br&gt;
So I added one.&lt;/p&gt;

&lt;p&gt;Every interaction now gets scored by a local Ollama model — fast enough to not be annoying, scoring on accuracy, relevance, and appropriate confidence. &lt;br&gt;
Interactions below a threshold trigger a reflection prompt: the model looks at the interaction and generates a short analysis of what went wrong. &lt;br&gt;
Those reflections feed into DSPy to optimize the underlying system prompts, run periodically when there's enough new data.&lt;/p&gt;

&lt;p&gt;After around 800 scored interactions, patterns started coming through. &lt;br&gt;
The most consistent one: the assistant was overconfident on estimates. Timelines, complexity, quantities. &lt;br&gt;
Systematically biased toward underestimating. &lt;br&gt;
Not something I'd have caught session by session.&lt;/p&gt;

&lt;p&gt;Shorter, more direct answers also consistently scored better than thorough ones. Useful to know.&lt;/p&gt;

&lt;p&gt;Honest caveats: the Ollama scoring model is imperfect, and DSPy convergence is slow on a single-user dataset. &lt;/p&gt;

&lt;p&gt;This is genuinely more experiment than finished feature.&lt;br&gt;
But having a feedback loop at all changes how you think about the system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/sliamh11/Deus" rel="noopener noreferrer"&gt;https://github.com/sliamh11/Deus&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>promptengineering</category>
      <category>sideprojects</category>
    </item>
    <item>
      <title>I gave my self-hosted AI shell access — then immediately sandboxed every conversation</title>
      <dc:creator>Liam Steiner</dc:creator>
      <pubDate>Mon, 13 Apr 2026 08:20:37 +0000</pubDate>
      <link>https://forem.com/sliamh11/i-gave-my-self-hosted-ai-shell-access-then-immediately-sandboxed-every-conversation-494j</link>
      <guid>https://forem.com/sliamh11/i-gave-my-self-hosted-ai-shell-access-then-immediately-sandboxed-every-conversation-494j</guid>
      <description>&lt;p&gt;I wanted my assistant to be able to actually do things. Run scripts, read files, execute code.&lt;/p&gt;

&lt;p&gt;The moment I wired that up, something felt off. Not dramatically — just the basic instinct that something with shell access and persistent memory probably shouldn't have unrestricted reach. &lt;br&gt;
And if I'm running multiple conversation contexts, I don't want them touching each other.&lt;/p&gt;

&lt;p&gt;So I added container isolation. &lt;br&gt;
Every conversation in Deus now runs in its own container — Docker on Linux, Apple Container on macOS.&lt;/p&gt;

&lt;p&gt;Each gets an isolated filesystem and isolated memory. When the session ends, the container goes with it.&lt;/p&gt;

&lt;p&gt;A few things this solves: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The host machine stays clean&lt;/li&gt;
&lt;li&gt;contexts don't share state&lt;/li&gt;
&lt;li&gt;and — this surprised me — it made me more willing to give the agent permissions within the container. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The blast radius is scoped. &lt;br&gt;
It's a better mental model than trying to specify everything via prompt.&lt;/p&gt;

&lt;p&gt;Is this overkill for Q&amp;amp;A? Yes. &lt;br&gt;
Did it feel like the right call the moment shell access entered the picture? Also yes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/sliamh11/Deus" rel="noopener noreferrer"&gt;https://github.com/sliamh11/Deus&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>docker</category>
      <category>security</category>
      <category>showdev</category>
    </item>
    <item>
      <title>My AI study sessions kept starting from zero — so I built persistent memory into the assistant</title>
      <dc:creator>Liam Steiner</dc:creator>
      <pubDate>Thu, 09 Apr 2026 10:42:41 +0000</pubDate>
      <link>https://forem.com/sliamh11/my-ai-study-sessions-kept-starting-from-zero-so-i-built-persistent-memory-into-the-assistant-dlk</link>
      <guid>https://forem.com/sliamh11/my-ai-study-sessions-kept-starting-from-zero-so-i-built-persistent-memory-into-the-assistant-dlk</guid>
      <description>&lt;p&gt;I study physics in whatever gaps exist in my day. Twenty minutes before work, an hour on a good evening. Sessions are short and scattered.&lt;/p&gt;

&lt;p&gt;The problem wasn't the AI — it was the re-onboarding. Every session I'd spend the first ten minutes re-explaining where I was: what topic, what notation&lt;br&gt;
  we'd agreed on, what I was stuck on last time. Half the session gone before asking anything useful.&lt;/p&gt;

&lt;p&gt;So I built a memory layer. Every conversation gets indexed into a local sqlite-vec database with Gemini embeddings. When a new session starts, it queries semantically for relevant past context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;q_vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
      SELECT e.path, e.date, e.tldr, e.topics, v.distance
      FROM embeddings v
      JOIN entries e ON e.id = v.rowid
      WHERE v.embedding MATCH ? AND k = ?
      ORDER BY v.distance
      &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q_vec&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;top&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference in practice: I ask about the Euler-Lagrange equation and it already knows we spent three sessions on constrained systems last week, that I&lt;br&gt;
  keep mixing up generalized coordinates, and that we're using Goldstein as the reference. I don't say any of that — it just knows.&lt;/p&gt;

&lt;p&gt;But it's not limited to study sessions. &lt;br&gt;
Last week I asked it what movie my roommate and I could watch together. It knew both our tastes from past conversations and gave a recommendation that actually made sense. &lt;br&gt;
That's when it clicked for me — it doesn't just remember topics, it remembers you.&lt;/p&gt;

&lt;p&gt;Everything runs locally — no cloud memory store, no third-party seeing your conversations. &lt;br&gt;
Open source if you want to dig in: github.com/sliamh11/Deus&lt;/p&gt;

</description>
      <category>ai</category>
      <category>learning</category>
      <category>selfhosted</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
