<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Octavian</title>
    <description>The latest articles on Forem by Octavian (@octaviannn).</description>
    <link>https://forem.com/octaviannn</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F153515%2Fdf2e4f5b-4965-4a8a-8538-40550bdc5751.jpeg</url>
      <title>Forem: Octavian</title>
      <link>https://forem.com/octaviannn</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/octaviannn"/>
    <language>en</language>
    <item>
      <title>Why most AI agent frameworks break in production (and what I’m doing differently)</title>
      <dc:creator>Octavian</dc:creator>
      <pubDate>Thu, 26 Mar 2026 07:06:06 +0000</pubDate>
      <link>https://forem.com/octaviannn/why-most-ai-agent-frameworks-break-in-production-and-what-im-doing-differently-3f5i</link>
      <guid>https://forem.com/octaviannn/why-most-ai-agent-frameworks-break-in-production-and-what-im-doing-differently-3f5i</guid>
      <description>&lt;p&gt;Over the past months I’ve been building a WhatsApp-first AI receptionist that handles booking and rescheduling directly into Google Calendar.&lt;/p&gt;

&lt;p&gt;One thing became obvious quickly:&lt;/p&gt;

&lt;p&gt;Most agent frameworks work well locally, but start breaking once you try to run them as a multi-tenant production service.&lt;/p&gt;

&lt;p&gt;Session state disappears. Memory becomes fragile. PII handling is unclear. Human escalation is missing.&lt;/p&gt;

&lt;p&gt;So I started building an opinionated framework called Glaivio, applying a convention-over-configuration approach similar to what Rails did for web apps — but for AI agents.&lt;/p&gt;

&lt;p&gt;The goal isn’t flexibility.&lt;/p&gt;

&lt;p&gt;The goal is predictable production behavior.&lt;/p&gt;

&lt;p&gt;Repo:&lt;br&gt;
&lt;a href="https://github.com/tavyy/glaivio-ai" rel="noopener noreferrer"&gt;https://github.com/tavyy/glaivio-ai&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The problem with most agent architectures today
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Typical agent demos assume:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;stateless execution&lt;/li&gt;
&lt;li&gt;single-user workflows&lt;/li&gt;
&lt;li&gt;local memory files&lt;/li&gt;
&lt;li&gt;no escalation path&lt;/li&gt;
&lt;li&gt;no privacy middleware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That works for prototypes.&lt;/p&gt;

&lt;p&gt;It doesn’t work when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multiple customers interact simultaneously&lt;/li&gt;
&lt;li&gt;conversations persist across sessions&lt;/li&gt;
&lt;li&gt;messages contain personal data&lt;/li&gt;
&lt;li&gt;agents must hand off safely to humans&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These problems appear immediately when deploying agents inside real businesses.&lt;/p&gt;
&lt;h3&gt;
  
  
  Design principle: state should be infrastructure, not an afterthought
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F39il6xgn3cdupvibj4aj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F39il6xgn3cdupvibj4aj.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most frameworks treat conversation history as optional memory.&lt;/p&gt;

&lt;p&gt;Glaivio treats it as required infrastructure.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;temporary buffers&lt;/li&gt;
&lt;li&gt;flat JSON memory files&lt;/li&gt;
&lt;li&gt;or ephemeral runtime context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;conversation history is Postgres-backed by default.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;session continuity across restarts&lt;/li&gt;
&lt;li&gt;multi-tenant compatibility&lt;/li&gt;
&lt;li&gt;auditability&lt;/li&gt;
&lt;li&gt;production-ready persistence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agents should not “forget” conversations because a container restarted.&lt;/p&gt;
&lt;h3&gt;
  
  
  Privacy middleware should exist before the LLM call
&lt;/h3&gt;

&lt;p&gt;Another issue I kept seeing:&lt;/p&gt;

&lt;p&gt;PII goes directly to model providers.&lt;/p&gt;

&lt;p&gt;For production systems this becomes a real blocker.&lt;/p&gt;

&lt;p&gt;Glaivio includes a middleware layer (work in progress) that automatically redacts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;phone numbers&lt;/li&gt;
&lt;li&gt;emails&lt;/li&gt;
&lt;li&gt;NHS numbers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;before sending payloads to the LLM provider.&lt;/p&gt;

&lt;p&gt;The goal is to make privacy a default behavior instead of an integration burden.&lt;/p&gt;
&lt;h3&gt;
  
  
  Agents need native escalation paths
&lt;/h3&gt;

&lt;p&gt;Real users don’t behave like demos.&lt;/p&gt;

&lt;p&gt;Eventually the agent becomes uncertain.&lt;/p&gt;

&lt;p&gt;Most frameworks leave escalation as an application-layer concern.&lt;/p&gt;

&lt;p&gt;Glaivio includes a simple trigger mechanism:&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;on_confusion → escalate to human operator&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;In practice this connects easily to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WhatsApp&lt;/li&gt;
&lt;li&gt;Email&lt;/li&gt;
&lt;li&gt;support dashboards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This prevents agents from getting stuck in loops.&lt;/p&gt;

&lt;h3&gt;
  
  
  Self-improving agents without prompt hacking
&lt;/h3&gt;

&lt;p&gt;One recurring issue with deployed agents:&lt;/p&gt;

&lt;p&gt;users correct them constantly.&lt;/p&gt;

&lt;p&gt;Usually this feedback disappears.&lt;/p&gt;

&lt;p&gt;Glaivio experiments with extracting corrections into a persistent corrections file so agents adapt over time without rewriting prompts manually.&lt;/p&gt;

&lt;p&gt;Still early, but promising.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory should behave more like cognition than logs
&lt;/h3&gt;

&lt;p&gt;Instead of loading entire histories into context windows, the framework is evolving toward a two-layer memory model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;conscious memory&lt;/li&gt;
&lt;li&gt;unconscious memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only distilled facts relevant to the current task enter the active reasoning window.&lt;/p&gt;

&lt;p&gt;The rest stays persistent but inactive.&lt;/p&gt;

&lt;p&gt;This keeps token usage predictable while preserving long-term knowledge.&lt;/p&gt;

&lt;p&gt;Example: building a WhatsApp receptionist agent&lt;/p&gt;

&lt;p&gt;Using this structure I was able to implement:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;booking&lt;/li&gt;
&lt;li&gt;rescheduling&lt;/li&gt;
&lt;li&gt;calendar updates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;into Google Calendar from WhatsApp conversations in roughly 20 lines of code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# skills/check_availability.py
from glaivio import skill

@skill
def check_availability(date: str, time: str) -&amp;gt; str:
    """Check if a time slot is available. Always call before book_appointment.
    date: YYYY-MM-DD, time: HH:MM 24h format."""
    # call your calendar API here
    return "Available"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
plaintext&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# skills/book_appointment.py
from glaivio import skill

@skill
def book_appointment(patient_name: str, patient_phone: str, date: str, time: str) -&amp;gt; str:
    """Book an appointment. Only call after check_availability confirms the slot is free.
    patient_phone: use the current user's ID from context.
    date: YYYY-MM-DD, time: HH:MM 24h format."""
    # call your calendar API here
    return f"Booked {patient_name} on {date} at {time}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
plaintext&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from dotenv import load_dotenv
load_dotenv()

from glaivio import Agent
from skills.check_availability import check_availability
from skills.book_appointment import book_appointment

agent = Agent(
    instructions="prompts/system.md",
    skills=[check_availability, book_appointment],
    learn_from_feedback=True,
    privacy=True,
)

if __name__ == "__main__":
    agent.run(channel="whatsapp")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The goal of the framework is to make production agent deployments feel closer to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rails apps&lt;/li&gt;
&lt;li&gt;Django apps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;instead of experimental scripts.&lt;/p&gt;

&lt;p&gt;What I’m exploring next&lt;/p&gt;

&lt;p&gt;Still working on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;privacy middleware&lt;/li&gt;
&lt;li&gt;correction learning pipeline&lt;/li&gt;
&lt;li&gt;tiered memory loading&lt;/li&gt;
&lt;li&gt;human escalation integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Feedback welcome (and stars help)
&lt;/h3&gt;

&lt;p&gt;This framework is still early and evolving around real deployment constraints rather than research demos.&lt;/p&gt;

&lt;p&gt;If you're working on production AI agents, especially anything multi-tenant, privacy-sensitive, or customer-facing, I’d really value your feedback on what breaks first in your setup and what infrastructure you wish existed by default.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/tavyy/glaivio-ai" rel="noopener noreferrer"&gt;https://github.com/tavyy/glaivio-ai&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agentskills</category>
      <category>devtool</category>
      <category>runnerhchallenge</category>
    </item>
    <item>
      <title>Demand for Ruby/Rails 2022</title>
      <dc:creator>Octavian</dc:creator>
      <pubDate>Mon, 17 Oct 2022 10:21:52 +0000</pubDate>
      <link>https://forem.com/octaviannn/demand-for-rubyrails-2022-4gmb</link>
      <guid>https://forem.com/octaviannn/demand-for-rubyrails-2022-4gmb</guid>
      <description>&lt;p&gt;Hey guys,&lt;/p&gt;

&lt;p&gt;How do you find the demand for Ruby/Rails in 2022? Less and less projects are available and although e.g HackerRank still has it as available language to do the coding challenges, a lot of similar sites have dropped it.&lt;/p&gt;

&lt;p&gt;Are you worried? What's your take on it?&lt;/p&gt;

&lt;p&gt;Please mention the country your are from as well if possible. I'm based on London, UK and while there is still demand for Ruby services here, I feel like it's fading. &lt;/p&gt;

</description>
      <category>ruby</category>
    </item>
    <item>
      <title>Architecture Thoughts?</title>
      <dc:creator>Octavian</dc:creator>
      <pubDate>Sat, 14 Nov 2020 18:40:08 +0000</pubDate>
      <link>https://forem.com/octaviannn/architecture-thoughts-4bc5</link>
      <guid>https://forem.com/octaviannn/architecture-thoughts-4bc5</guid>
      <description>&lt;p&gt;Hi everyone!&lt;/p&gt;

&lt;p&gt;I would like to get your input on something. Basically, my client has a live version of their platform that consists of 3 apps (Frontend, Backend and API) that have separate Github repos. There is an urgent requirement to build a V2 of the service that will be using completely different data sources to process.&lt;/p&gt;

&lt;p&gt;Because of urgency and to reduce the risk they proposed the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep V1 as is&lt;/li&gt;
&lt;li&gt;Clone V1 into V2 and change the data sources and deploy it on separate AWS instances.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They want to Fork the repos to avoid risk, but eventually, want to merge V2 and V1 (so they want to remain eventually with just 3 repos as it is now, but every app would be able to accommodate both old/new data).&lt;/p&gt;

&lt;p&gt;If we fork these we end up maintaining 6 repos and I can see a huge pain when trying to merge them back together eventually. Plus the code collaboration would become more complicated.&lt;/p&gt;

&lt;p&gt;I am inclined to instead of forking create a v2 branch on each of repos.&lt;/p&gt;

&lt;p&gt;The advantages that I see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;easier to maintain&lt;/li&gt;
&lt;li&gt;increased visibility&lt;/li&gt;
&lt;li&gt;easier to merge in the end&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Disadvantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;probably one big PR in the end on each of the apps&lt;/li&gt;
&lt;li&gt;merging v2 branch with the master on each of the repos has to happen simultaneously on all 3 apps (although feature flags could be used)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note: All 3 apps are pretty complicated with big codebases and they work tightly together. &lt;/p&gt;

&lt;p&gt;What are your thoughts?&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>rails</category>
      <category>github</category>
    </item>
    <item>
      <title>Quick prototyping Web App / Mobile App</title>
      <dc:creator>Octavian</dc:creator>
      <pubDate>Sat, 04 Apr 2020 12:49:00 +0000</pubDate>
      <link>https://forem.com/octaviannn/quick-prototyping-web-app-mobile-app-5dml</link>
      <guid>https://forem.com/octaviannn/quick-prototyping-web-app-mobile-app-5dml</guid>
      <description>&lt;p&gt;Hi everyone. I would like to hear your thoughts about the latest technologies you find very easy to quickly prototype a modern web app / mobile app (also considering the learning curve here as well)?&lt;/p&gt;

</description>
      <category>development</category>
      <category>webapp</category>
      <category>mobileapp</category>
      <category>quickprototyping</category>
    </item>
    <item>
      <title>How do you keep your motivation? </title>
      <dc:creator>Octavian</dc:creator>
      <pubDate>Tue, 06 Aug 2019 15:47:27 +0000</pubDate>
      <link>https://forem.com/octavus88/how-do-you-keep-your-motivation-41gp</link>
      <guid>https://forem.com/octavus88/how-do-you-keep-your-motivation-41gp</guid>
      <description>&lt;p&gt;Hey, everyone, I have a question about a challenge I've been having for a while. How do you find the motivation to learn some new tech knowing that you are about to put in hours but you know there is absolutely no plane to monetize it?&lt;/p&gt;

</description>
      <category>development</category>
      <category>newtech</category>
    </item>
    <item>
      <title>Do you read dev books?</title>
      <dc:creator>Octavian</dc:creator>
      <pubDate>Sun, 12 May 2019 13:41:54 +0000</pubDate>
      <link>https://forem.com/octavus88/do-you-ready-dev-books-34ll</link>
      <guid>https://forem.com/octavus88/do-you-ready-dev-books-34ll</guid>
      <description>&lt;p&gt;Hey, guys, I'm just wondering if you read any books about software architecture, patterns etc in your spare time? I find it very hard to focus and get the patience to sit down and just read?&lt;/p&gt;

</description>
      <category>softwaredevelopment</category>
      <category>books</category>
      <category>read</category>
    </item>
  </channel>
</rss>
