Forem: Ayoola Solomon

How I Designed a Modular, Event-Driven Architecture for Real-Time Voice AI

Ayoola Solomon — Wed, 19 Nov 2025 09:49:54 +0000

Most voice AI systems today are built as a fixed chain:

STT → LLM → TTS → Audio Output.

This works for demos, but falls apart the moment you need:

Custom business logic
CRM integrations
Multi-agent routing
Knowledge lookups
Scheduling flows
Post-call actions
Pipeline branching

Swappable providers (Claude vs GPT, Deepgram vs Whisper, etc.)

So for EchoStack, I scrapped the idea of a “voice bot pipeline” entirely and built a voice automation platform powered by an event-driven orchestration layer.

Here’s how the architecture works — and why it has completely changed what’s possible with real-time AI.

LiveKit Only Handles Ingress & Egress

Not STT.
Not LLM.
Not TTS.

Just pure audio transport:

User Mic → LiveKit → EchoStack  
EchoStack → LiveKit → User Speaker

Inside EchoStack, every audio frame becomes an event:

processing.livekit.audio_frame

This makes the audio layer fully modular and independent of AI logic.

Everything Inside EchoStack Is a Connector

A connector can be:

Deepgram STT
WhisperX
AssemblyAI
Claude
GPT-4o
Llama 3
ElevenLabs
Azure Neural TTS
HubSpot
Salesforce
Zendesk
Calendly
A custom HTTP API
A knowledge search
A database entry
Or even another AI agent

{
  "consumes": ["processing.deepgram.text"],
  "produces": ["processing.claude.agent_message"]
}

EchoStack uses this to decide where events flow next.

This creates a real-time version of Zapier or LangGraph.

Pipelines Are Just Manifests

Instead of hardcoded logic, pipelines are defined like this:

{
  "pipeline": [
    "ingress.livekit.audio_frame → deepgram.stt",
    "deepgram.stt → claude.agent",
    "claude.agent → elevenlabs.tts",
    "elevenlabs.tts → egress.livekit.audio_chunk"
  ]
}

No code.
No wiring.
Just declarative routing.

Want to swap Deepgram for Whisper?
Edit one line.

Want to add sentiment analysis between STT and LLM?
Add one rule.

Want multi-agent routing?
Add a router connector.

Multi-Playbook Orchestration (The Real Game-Changer)

Traditional voice agents can only run one flow.

EchoStack can run many — and switch between them in real time:

LeadQualifier.json  
MeetingBooker.json  
FAQBot.json  
SupportAgent.json  
CRMLogger.json

If the user says:

“I want to book a meeting.”

Routing connector switches the playbook:

processing.deepgram.text → intent.router → meeting_booker.playbook

This is impossible in a linear voice bot pipeline, but trivial in an event system.

Real-Time Streaming (STT, LLM, TTS)

Because everything is async events, the system supports:

Streaming STT transcripts
Streaming LLM tokens (Claude / GPT-4o)
Streaming TTS audio chunks
Barge-in and interruption
Live agent escalation
Parallel processing
Multi-agent collaboration

Example LLM output stream event:

processing.claude.agent_message.partial

Example TTS stream:

processing.elevenlabs.audio_chunk.stream

The user hears responses as they are generated — not after the full LLM response.

Full Pipeline Simulation (No LiveKit Needed)

This is my favorite feature.

EchoStack can simulate:

Audio → STT
STT → LLM
LLM → TTS
TTS → Egress
All connector interactions

Without touching real providers.

It utilizes a mock runtime registry to generate realistic, fake outputs.

This allows:

Visual debugging
Step-by-step replay
Education demos
Test-driven development
Predictable QA
“Dry runs” before deployment

This is something even Retell & Vapi don’t have today.

And It Scales Like a Distributed System

Because everything is events:

Each connector is a worker
Workers scale horizontally
Backpressure is manageable
Failures can be contained
Retries & fallbacks are simple
Pipelines can fork or merge
Multi-agent flows work naturally
Audioless connectors (CRM, DB, API) blend seamlessly

It behaves like:

Zapier
AWS EventBridge
LangGraph
Airflow
N8N

…but optimized for real-time audio.

What This Unlocks for Businesses

This is where the architecture stops being “cool tech” and becomes actual value:

Lead qualification
After-hours support
Customer triage
Booking assistants
Helpdesk automation
Sales follow-ups
Knowledge Q&A
Order tracking
Multi-agent escalation
CRM syncing
Custom playbooks per industry
Complex routing between AI tools

You don’t just deploy “a bot.”

You deploy a network of intelligent voice automations.

Closing Thoughts

Voice AI is moving fast, but most of what exists today is still:

rigid
non-composable
difficult to integrate
tied to single vendors
non-debuggable
non-portable

By making the entire system event-driven and connector-based, EchoStack becomes:

A real-time automation platform where voice is the entry point — not the limitation.

If you’re into real-time systems, LiveKit, STT/LLM/TTS pipelines, or voice automation, I’d love to exchange ideas.

Inside the Manifest: How We Make Voice-AI Playbooks Deployable

Ayoola Solomon — Mon, 10 Nov 2025 21:51:59 +0000

Most businesses don’t want another AI “demo.”
They want deployable outcomes — like qualifying a lead, booking a meeting, or handling an after-hours call automatically.

At EchoStack, we wanted to make these outcomes as easy to launch as deploying code — and that’s how the manifest was born.

The Problem

Voice-AI tools today can hold great conversations, but they often stop there.
To actually move data into CRMs, book meetings, or send follow-ups, you need engineers stitching APIs together.

That doesn’t scale for non-technical teams.

Our Solution — The Manifest

Each playbook in EchoStack (e.g., Lead Qualifier, After-Hours Support) is powered by a manifest — a single file that describes:

what the playbook does
which tools it connects to
and what should happen when key events occur.

It’s like Terraform, but for voice-driven business outcomes.

Example Manifest (Simplified)

Here’s a simplified version of what a playbook looks like under the hood:

{
  "id": "lead-qualifier",
  "title": "Voice Lead Qualifier",
  "description": "Qualifies inbound leads via voice and syncs results to CRM.",
  "connectors": ["twilio", "hubspot", "calendly"],
  "events": {
    "call.started": "start_conversation",
    "lead.qualified": "create_contact",
    "meeting.booked": "schedule_event"
  }
}

This isn’t code — it’s a declarative contract between the voice experience and the business stack.
Our platform reads this file, provisions the right connectors, and handles orchestration automatically.

Why It Matters

No-code deployment: Business teams can launch playbooks instantly.
Version control: Every manifest can be tracked, forked, and redeployed.
Extensibility: Developers can author new playbooks using familiar patterns (JSON, events, connectors).

It turns AI workflows into deployable building blocks — reusable, composable, and measurable.

What’s Next

We’re expanding the manifest system to support richer event schemas and multi-step orchestration.
Soon, teams will be able to chain multiple playbooks together — stacking outcomes like Lego blocks.

Closing Thought

If you’ve ever written Terraform for infrastructure or YAML for CI/CD, imagine doing that —
but for voice automation.
That’s what EchoStack’s manifests make possible.

Curious to see a manifest in action?
We’re opening early access for developers building voice-AI workflows — you can join at getechostack.com

Designing Deployable Voice-AI Playbooks

Ayoola Solomon — Tue, 28 Oct 2025 08:54:07 +0000

This is a design/engineering write-up for our EchoStack pivot. We’re packaging Voice-AI playbooks (like After-hours Answering and Lead Qualifier → Auto-Book) into deployable solutions with no-code setup, safe rollouts, and KPI tiles.

Status: Early Access only — we’re validating integrations and rollout safety with a small group before opening signups.

Why playbooks (not tool soup)

Teams don’t buy models; they buy outcomes: fewer missed calls, more booked meetings, lower AHT. The hard parts are barge-in latency and safe deployment, not the LLM itself.

Latency budget we hold ourselves to (p95 targets)

ASR partials: 60–90 ms
LLM first token: 80–120 ms
TTS first audio: 50–80 ms
Network buffers: 40–60 ms → Goal: < 300 ms p95 end-to-end (barge-in friendly).

Rollout safety (what we’re building)

preflight → plan → apply (blue) → smoke test → switch (green) → rollback

Preflight checks scopes, latency probes, and config drift.
Plan shows a human-readable diff.
Apply deploys to an inactive slot (blue).
Switch flips traffic; Rollback is one click.

Integration surface (first adapters)

Telephony: Twilio/Plivo/SIP
Voice agent: Retell (others later)
Calendar: Calendly/Google
CRM: HubSpot/Salesforce (Sheets fallback)
Helpdesk: Zendesk (optional)

What exists today

Data model + manifests for two playbooks
No-code configuration flow (internal)
Preflight → plan → apply skeleton
KPI tiles (self-serve %, AHT, booked meetings) wired to session events

What we’re validating next

Region-aware routing under load
Failure modes during blue/green switches
Adapter ergonomics (CRM/calendar/telephony edge cases)

Looking for feedback

Are these p95 targets realistic for your use case?
What minimum logs/SLAs make you comfortable with rollout?
Which adapter combo should be prioritized?

More context: https://getechostack.com/playbooks

Early Access (no public demo yet): https://getechostack.com/contact?subject=Early%20Access

From building a voice AI widget to mapping the entire Voice AI ecosystem (Introducing echostack)

Ayoola Solomon — Mon, 13 Oct 2025 19:04:15 +0000

Hey everyone,

I’m Solomon — the creator of GetEchoSpace, a voice AI widget that lets any website host real-time audio conversations for support, live shopping, or community.

While building it, I constantly had to combine tools for ASR, text-to-speech, and LLMs — juggling APIs from different vendors and testing pipelines just to get a working flow.

At some point, it hit me:

Everyone building in voice AI is reinventing the same workflows from scratch.

The Problem

There are incredible voice AI tools out there — from OpenAI’s speech APIs to ElevenLabs, Whisper, Speechmatics, and more.
But there’s no central place to discover, compare, and see how they connect in real-world setups.

Builders like me spend hours figuring out:

which ASR integrates best with Twilio,
how to pass data between TTS and LLMs,
and how to deploy these flows in production.

Enter echostack

So I started building echostack — a public directory of voice AI tools and ready-made “stacks.”

Think of it as Zapier templates or Stack Overflow for voice AI workflows.
Each stack shows how to combine tools (e.g., Retell + OpenAI + Twilio + GCP ASR) to achieve real outcomes — like multilingual dubbing, customer triage bots, or AI-powered voice assistants.

The goal:

help developers and AI builders spend less time wiring tools, and more time shipping value.

Tech Behind the MVP

The MVP is built with:

Next.js 15 (App Router)
TypeScript + Tailwind
Supabase (for data)
Zapier & n8n export support planned for v0.2

What’s Live Now

You can explore:

Featured voice AI tools
Early “stacks” (like multilingual dubbing or real-time triage bots)
Newsletter signup for updates as new stacks drop

https://getechostack.com

I’d love your feedback

If you’re building with Voice-AI or integrating ASR/TTS/LLM tools, I’d love to hear:

What workflows or “stacks” you’d want to see next
Which tools are must-haves for you
Whether you prefer no-code or code-level examples

What’s Next?

Expand to more tools and stacks
Add semantic search and tagging
Support Zapier/n8n exports
Launch the curated Voice-AI Stacks Newsletter

If that sounds interesting, you can check it out or share feedback directly on echostack.