Forem: Octavian

Why most AI agent frameworks break in production (and what I’m doing differently)

Octavian — Thu, 26 Mar 2026 07:06:06 +0000

Over the past months I’ve been building a WhatsApp-first AI receptionist that handles booking and rescheduling directly into Google Calendar.

One thing became obvious quickly:

Most agent frameworks work well locally, but start breaking once you try to run them as a multi-tenant production service.

Session state disappears. Memory becomes fragile. PII handling is unclear. Human escalation is missing.

So I started building an opinionated framework called Glaivio, applying a convention-over-configuration approach similar to what Rails did for web apps — but for AI agents.

The goal isn’t flexibility.

The goal is predictable production behavior.

Repo:
https://github.com/tavyy/glaivio-ai

The problem with most agent architectures today

Typical agent demos assume:

stateless execution
single-user workflows
local memory files
no escalation path
no privacy middleware

That works for prototypes.

It doesn’t work when:

multiple customers interact simultaneously
conversations persist across sessions
messages contain personal data
agents must hand off safely to humans

These problems appear immediately when deploying agents inside real businesses.

Design principle: state should be infrastructure, not an afterthought

Most frameworks treat conversation history as optional memory.

Glaivio treats it as required infrastructure.

Instead of:

temporary buffers
flat JSON memory files
or ephemeral runtime context

conversation history is Postgres-backed by default.

That means:

session continuity across restarts
multi-tenant compatibility
auditability
production-ready persistence

Agents should not “forget” conversations because a container restarted.

Privacy middleware should exist before the LLM call

Another issue I kept seeing:

PII goes directly to model providers.

For production systems this becomes a real blocker.

Glaivio includes a middleware layer (work in progress) that automatically redacts:

phone numbers
emails
NHS numbers

before sending payloads to the LLM provider.

The goal is to make privacy a default behavior instead of an integration burden.

Agents need native escalation paths

Real users don’t behave like demos.

Eventually the agent becomes uncertain.

Most frameworks leave escalation as an application-layer concern.

Glaivio includes a simple trigger mechanism:

on_confusion → escalate to human operator

In practice this connects easily to:

WhatsApp
Email
support dashboards

This prevents agents from getting stuck in loops.

Self-improving agents without prompt hacking

One recurring issue with deployed agents:

users correct them constantly.

Usually this feedback disappears.

Glaivio experiments with extracting corrections into a persistent corrections file so agents adapt over time without rewriting prompts manually.

Still early, but promising.

Memory should behave more like cognition than logs

Instead of loading entire histories into context windows, the framework is evolving toward a two-layer memory model:

conscious memory
unconscious memory

Only distilled facts relevant to the current task enter the active reasoning window.

The rest stays persistent but inactive.

This keeps token usage predictable while preserving long-term knowledge.

Example: building a WhatsApp receptionist agent

Using this structure I was able to implement:

booking
rescheduling
calendar updates

into Google Calendar from WhatsApp conversations in roughly 20 lines of code.

# skills/check_availability.py
from glaivio import skill

@skill
def check_availability(date: str, time: str) -> str:
    """Check if a time slot is available. Always call before book_appointment.
    date: YYYY-MM-DD, time: HH:MM 24h format."""
    # call your calendar API here
    return "Available"

plaintext

# skills/book_appointment.py
from glaivio import skill

@skill
def book_appointment(patient_name: str, patient_phone: str, date: str, time: str) -> str:
    """Book an appointment. Only call after check_availability confirms the slot is free.
    patient_phone: use the current user's ID from context.
    date: YYYY-MM-DD, time: HH:MM 24h format."""
    # call your calendar API here
    return f"Booked {patient_name} on {date} at {time}"

plaintext

from dotenv import load_dotenv
load_dotenv()

from glaivio import Agent
from skills.check_availability import check_availability
from skills.book_appointment import book_appointment

agent = Agent(
    instructions="prompts/system.md",
    skills=[check_availability, book_appointment],
    learn_from_feedback=True,
    privacy=True,
)

if __name__ == "__main__":
    agent.run(channel="whatsapp")

The goal of the framework is to make production agent deployments feel closer to:

Rails apps
Django apps

instead of experimental scripts.

What I’m exploring next

Still working on:

privacy middleware
correction learning pipeline
tiered memory loading
human escalation integrations

Feedback welcome (and stars help)

This framework is still early and evolving around real deployment constraints rather than research demos.

If you're working on production AI agents, especially anything multi-tenant, privacy-sensitive, or customer-facing, I’d really value your feedback on what breaks first in your setup and what infrastructure you wish existed by default.

https://github.com/tavyy/glaivio-ai

Demand for Ruby/Rails 2022

Octavian — Mon, 17 Oct 2022 10:21:52 +0000

Hey guys,

How do you find the demand for Ruby/Rails in 2022? Less and less projects are available and although e.g HackerRank still has it as available language to do the coding challenges, a lot of similar sites have dropped it.

Are you worried? What's your take on it?

Please mention the country your are from as well if possible. I'm based on London, UK and while there is still demand for Ruby services here, I feel like it's fading.

Architecture Thoughts?

Octavian — Sat, 14 Nov 2020 18:40:08 +0000

Hi everyone!

I would like to get your input on something. Basically, my client has a live version of their platform that consists of 3 apps (Frontend, Backend and API) that have separate Github repos. There is an urgent requirement to build a V2 of the service that will be using completely different data sources to process.

Because of urgency and to reduce the risk they proposed the following:

Keep V1 as is
Clone V1 into V2 and change the data sources and deploy it on separate AWS instances.

They want to Fork the repos to avoid risk, but eventually, want to merge V2 and V1 (so they want to remain eventually with just 3 repos as it is now, but every app would be able to accommodate both old/new data).

If we fork these we end up maintaining 6 repos and I can see a huge pain when trying to merge them back together eventually. Plus the code collaboration would become more complicated.

I am inclined to instead of forking create a v2 branch on each of repos.

The advantages that I see:

easier to maintain
increased visibility
easier to merge in the end

Disadvantages:

probably one big PR in the end on each of the apps
merging v2 branch with the master on each of the repos has to happen simultaneously on all 3 apps (although feature flags could be used)

Note: All 3 apps are pretty complicated with big codebases and they work tightly together.

What are your thoughts?

Quick prototyping Web App / Mobile App

Octavian — Sat, 04 Apr 2020 12:49:00 +0000

Hi everyone. I would like to hear your thoughts about the latest technologies you find very easy to quickly prototype a modern web app / mobile app (also considering the learning curve here as well)?

How do you keep your motivation?

Octavian — Tue, 06 Aug 2019 15:47:27 +0000

Hey, everyone, I have a question about a challenge I've been having for a while. How do you find the motivation to learn some new tech knowing that you are about to put in hours but you know there is absolutely no plane to monetize it?

Do you read dev books?

Octavian — Sun, 12 May 2019 13:41:54 +0000

Hey, guys, I'm just wondering if you read any books about software architecture, patterns etc in your spare time? I find it very hard to focus and get the patience to sit down and just read?