Forem: Liam Steiner

I added a local eval loop to my personal AI assistant — here's what 800 scored interactions taught me

Liam Steiner — Tue, 14 Apr 2026 07:22:50 +0000

I'd been using my self-hosted assistant daily for a few months. Long enough to have a sense that some interactions were useful and some weren't. Not long enough to do anything about it.

The problem: no feedback mechanism. I could tell a bad response when I saw it, but there was no signal that accumulated.
So I added one.

Every interaction now gets scored by a local Ollama model — fast enough to not be annoying, scoring on accuracy, relevance, and appropriate confidence.
Interactions below a threshold trigger a reflection prompt: the model looks at the interaction and generates a short analysis of what went wrong.
Those reflections feed into DSPy to optimize the underlying system prompts, run periodically when there's enough new data.

After around 800 scored interactions, patterns started coming through.
The most consistent one: the assistant was overconfident on estimates. Timelines, complexity, quantities.
Systematically biased toward underestimating.
Not something I'd have caught session by session.

Shorter, more direct answers also consistently scored better than thorough ones. Useful to know.

Honest caveats: the Ollama scoring model is imperfect, and DSPy convergence is slow on a single-user dataset.

This is genuinely more experiment than finished feature.
But having a feedback loop at all changes how you think about the system.

https://github.com/sliamh11/Deus

I gave my self-hosted AI shell access — then immediately sandboxed every conversation

Liam Steiner — Mon, 13 Apr 2026 08:20:37 +0000

I wanted my assistant to be able to actually do things. Run scripts, read files, execute code.

The moment I wired that up, something felt off. Not dramatically — just the basic instinct that something with shell access and persistent memory probably shouldn't have unrestricted reach.
And if I'm running multiple conversation contexts, I don't want them touching each other.

So I added container isolation.
Every conversation in Deus now runs in its own container — Docker on Linux, Apple Container on macOS.

Each gets an isolated filesystem and isolated memory. When the session ends, the container goes with it.

A few things this solves:

The host machine stays clean
contexts don't share state
and — this surprised me — it made me more willing to give the agent permissions within the container.

The blast radius is scoped.
It's a better mental model than trying to specify everything via prompt.

Is this overkill for Q&A? Yes.
Did it feel like the right call the moment shell access entered the picture? Also yes.

https://github.com/sliamh11/Deus

My AI study sessions kept starting from zero — so I built persistent memory into the assistant

Liam Steiner — Thu, 09 Apr 2026 10:42:41 +0000

I study physics in whatever gaps exist in my day. Twenty minutes before work, an hour on a good evening. Sessions are short and scattered.

The problem wasn't the AI — it was the re-onboarding. Every session I'd spend the first ten minutes re-explaining where I was: what topic, what notation
we'd agreed on, what I was stuck on last time. Half the session gone before asking anything useful.

So I built a memory layer. Every conversation gets indexed into a local sqlite-vec database with Gemini embeddings. When a new session starts, it queries semantically for relevant past context:

q_vec = embed(query)
  rows = db.execute(
      """
      SELECT e.path, e.date, e.tldr, e.topics, v.distance
      FROM embeddings v
      JOIN entries e ON e.id = v.rowid
      WHERE v.embedding MATCH ? AND k = ?
      ORDER BY v.distance
      """,
      [serialize(q_vec), top * 6],
  ).fetchall()

The difference in practice: I ask about the Euler-Lagrange equation and it already knows we spent three sessions on constrained systems last week, that I
keep mixing up generalized coordinates, and that we're using Goldstein as the reference. I don't say any of that — it just knows.

But it's not limited to study sessions.
Last week I asked it what movie my roommate and I could watch together. It knew both our tastes from past conversations and gave a recommendation that actually made sense.
That's when it clicked for me — it doesn't just remember topics, it remembers you.

Everything runs locally — no cloud memory store, no third-party seeing your conversations.
Open source if you want to dig in: github.com/sliamh11/Deus