Forem: VoltAgent

100+ Claude Code Subagent Collection

Necati Özmen — Tue, 05 Aug 2025 12:44:23 +0000

Looking to supercharge your development workflow with specialized AI agents? We've been working on Awesome Claude Code Subagents - a comprehensive collection of production-ready subagents for Claude Code.

Maintaining by VoltAgent Framework community

📦 Get All 110+ Claude Code Subagents.

👉 Explore the full collection on GitHub

Each subagent includes:

✅ Production-ready configurations
✅ MCP Tool integration
✅ Best practices compliance
✅ Regular updates
✅ Community support

🎯 Quick Start

# 1. Access subagent manager in Claude Code
/agents

# 2. Create new agent or use existing ones
# 3. Let Claude automatically delegate tasks

🤝 Join the Community

We're always looking for contributors! Whether you want to:

Submit new subagents
Improve existing ones
Share your use cases
Report issues

Check out our Contributing Guidelines and join our Discord Community.

What are Subagents?

Subagents are specialized AI assistants that enhance Claude Code's capabilities by providing task-specific expertise. Each operates with its own context window and domain-specific intelligence, making them perfect for focused development tasks.

🌟 All Categories with 110+ Specialized Subagents

1. Core Development (9 subagents)

Essential development subagents for everyday coding tasks:

2. Language Specialists (22 subagents)

Language-specific experts with deep framework knowledge:

3. Infrastructure (12 subagents)

DevOps, cloud, and deployment specialists:

4. Quality & Security (12 subagents)

Testing, security, and code quality experts:

5. Data & AI (12 subagents)

Data engineering, ML, and AI specialists:

6. Developer Experience (9 subagents)

Tooling and developer productivity experts:

7. Specialized Domains (10 subagents)

Domain-specific technology experts:

8. Business & Product (10 subagents)

Product management and business analysis:

9. Meta & Orchestration (8 subagents)

Agent coordination and meta-programming:

10. Research & Analysis (6 subagents)

Research, search, and analysis specialists:

🚀 VoltAgent Launch Week #1 — What’s New?

Necati Özmen — Wed, 18 Jun 2025 11:44:08 +0000

Hey everyone! 👋

We just kicked off Launch Week #1 for VoltAgent — our open-source TypeScript framework for building and observing LLM agents.

This is our first-ever launch week, and everything we’re shipping is shaped by community feedback over the past few months. Each day this week, we’re introducing new features designed to help you build, scale, and monitor AI agents more effectively.

💡 What’s VoltAgent?

VoltAgent is a flexible framework for orchestrating AI agents, with built-in support for:

Multi-agent workflows
Tool integrations
Custom routing & hooks

Tracing & debugging via VoltOps

🔎 What’s VoltOps?

VoltOps is our framework-agnostic observability layer for LLM apps. Even if you’re not using VoltAgent, you can still monitor your agents and see detailed traces using VoltOps. It now works with:

Vanilla JS / Python agents
Vercel AI SDK
LangChain, LangGraph (coming soon)

🔥 Launch Week Highlights (So far)

🟢 Day 1 – Framework-Agnostic Observability

VoltAgent Developer Console is now VoltOps — supports observability across any framework.

🟢 Day 2 – Streaming with fullStream

Capture every reasoning step, tool call, and completion signal — not just raw text.

🟢 Day 3 – Vercel AI UI Support

VoltAgent now includes plug-and-play UI components for building agent UIs with Vercel AI SDK.

(More days coming soon… 👀)

⸻

🙌 Join the Community

We’re open source and community-driven. You can:

⭐ Star the repo: https://github.com/VoltAgent/voltagent
💬 Join our Discord: https://s.voltagent.dev/discord

⸻

Thanks to everyone who’s given us feedback so far, it’s been amazing to see what people are building with VoltAgent.

Let’s build better agents, together. 🦾

— VoltAgent Team ⚡

✨ What an LLM Agent Framework Looks Like in 2025

Necati Özmen — Wed, 04 Jun 2025 10:03:08 +0000

"ChatGPT is amazing, but how do I integrate this into my own app?" - How many developers have heard this question...

LLMs changed our lives, no doubt about it. Since ChatGPT came out, everyone sees incredible possibilities. But let me tell you the truth as a developer: Using this power in our own applications is way harder than we thought.

Most of us go through the same cycle. First there's excitement: "I have an amazing AI idea!" Then quick start: We do API integration, simple examples work, everything looks good. But when real users come... that's when everything gets complicated. Code becomes unmanageable, every new feature breaks old code, debugging becomes a nightmare.

Did you go through this cycle? You're not alone.

The Real Problem: From API to Application

When you look at AI development with the traditional approach, it looks like this:

// Manual API call every time
const response = await openai.chat.completions.create({
  messages: [{ role: "user", content: userInput }],
});
// Custom code for every feature...

No problem at first. But then user requests start coming: "Can it use this tool?", "Can it remember past conversations?", "Can it behave differently in different situations?" You write code from scratch for every request. You solve the same problems over and over.

This is where an LLM agent framework comes in right here. They hide complexity behind abstraction layers:

// Define agent once, complexity handled by framework
const agent = new Agent({
  name: "customer-support",
  instructions: "Do customer support",
  tools: [orderTool, refundTool],
  memory: conversationMemory,
});

See the difference? The framework handles those thousands of lines of boilerplate code, error handling, memory management, tool orchestration and gives you a chance to just focus on business logic.

What's Out There?

At this point, developers have three main options.

Those who choose the DIY approach want full control but their lives become hell. They write everything from scratch, solve the same problems over and over. Might be reasonable for companies with big engineering teams but overkill for most projects.

Those who choose no-code/low-code platforms start fast but then hit walls. Visual editors are nice, don't require technical knowledge at first but when you want a custom feature, you get "you can't do that" as an answer. Vendor lock-in risk is also a pain.

LLM agent framework find a place between the two. They give you ready-made building blocks but don't compromise on flexibility. Production-ready, best practices built-in but you can customize however you want.

When deciding which option to go with, think about these: How's the programming language support? Is switching between LLM providers easy? What's the performance and scalability situation? How's the documentation quality? Is there community support? Are error handling, monitoring, security features good?

Tip

Start with a framework if you're building your first AI application. You can always migrate to custom solutions later when you understand your specific needs better.

Voltagent Example

:::note
The following examples show Voltagent's approach, but similar patterns exist in other frameworks like LangChain, AutoGen, and CrewAI. The concepts are transferable.
:::

At this point I want to give a concrete example. While developing Voltagent, we experienced exactly these problems and tried to solve them.

Voltagent's design philosophy is: "Powerful defaults, infinite customization" - meaning provide ready solutions for most use cases, but unlimited flexibility for special needs.

One of our most important decisions was being TypeScript-first. Why? Because type safety really saves lives. In complex agent systems, knowing which function takes what parameters is critical. We also made a modular package system - you only use what you need:

// Only use what you need
import { Agent } from "@voltagent/core";
import { VoiceAgent } from "@voltagent/voice"; // If needed

Provider-agnostic design was also very important. We didn't want vendor lock-in:

// Easy provider switching
const openaiAgent = new Agent({
  llm: new VercelAIProvider(),
  model: openai("gpt-4o"),
});

const anthropicAgent = new Agent({
  llm: new AnthropicProvider(),
  model: anthropic("claude-3-5-sonnet"),
});

From Simple Agents to Complex Systems

Creating an agent in its simplest form is really easy:

const agent = new Agent({
  name: "My Assistant",
  instructions: "Helpful and friendly assistant",
  llm: new VercelAIProvider(),
  model: openai("gpt-4o"),
});

// Usage is also simple
const response = await agent.generateText("Hello!");
console.log(response.text);

But the beautiful thing is, you can do much more complex stuff with the same API. For example structured data generation:

// Define schema for data extraction
const personSchema = z.object({
  name: z.string().describe("Full name"),
  age: z.number(),
  occupation: z.string(),
  skills: z.array(z.string()),
});

// Ask agent for structured data
const result = await agent.generateObject(
  "Create a profile for a software developer named Alex.",
  personSchema
);

console.log(result.object); // Type-safe JSON object

This feature is especially useful for data extraction and API responses. You're not saying "give it in JSON format" and then trying to parse it anymore.

Tool Integration: Real World Connection

We added MCP (Model Context Protocol) support in the tool integration part. This really became a game-changing feature:

// Define local tool
const weatherTool = createTool({
  name: "get_weather",
  description: "Get the current weather for a specific location",
  parameters: z.object({
    location: z.string().describe("City and state"),
  }),
  execute: async ({ location }) => {
    // Real API call would be here
    return { temperature: 72, conditions: "sunny" };
  },
});

// Connect to external MCP server
const mcpTools = await connectMCPServer("stdio://weather-server");

const agent = new Agent({
  name: "Weather Assistant",
  instructions: "Can check weather using available tools",
  llm: new VercelAIProvider(),
  model: openai("gpt-4o"),
  tools: [weatherTool, ...mcpTools], // Combine both
});

The agent decides which tool to use when by itself. You just say "How's the weather in London?", it calls its own tool and brings you the result.

Memory: Context Management

We also carefully designed the memory system. It's critical for agents to remember past conversations:

import { LibSQLStorage } from "@voltagent/core";

const memoryStorage = new LibSQLStorage({
  url: "file:local.db",
});

const agent = new Agent({
  name: "Assistant with Memory",
  instructions: "Remember our conversation history",
  llm: new VercelAIProvider(),
  model: openai("gpt-4o"),
  memory: memoryStorage, // Automatic context management
});

// First conversation
await agent.generateText("My name is John and I love pizza");

// Next conversation - will remember the previous one
await agent.generateText("What's my favorite food?");
// "Based on our previous conversation, you love pizza!"

The framework automatically fetches relevant context and saves new interactions.

Multi-Agent Systems

One of my favorite features is the sub-agent system. You can break complex tasks into small pieces and distribute them to expert agents:

const researchAgent = new Agent({
  name: "Researcher",
  instructions: "Research topics thoroughly using web search",
  tools: [webSearchTool],
});

const writerAgent = new Agent({
  name: "Writer",
  instructions: "Write engaging content based on research",
  tools: [contentGenerator],
});

const coordinator = new Agent({
  name: "Coordinator",
  instructions: "Coordinate research and writing tasks",
  llm: new VercelAIProvider(),
  model: openai("gpt-4o"),
  subAgents: [researchAgent, writerAgent], // Automatic delegate_task tool
});

// Complex workflow in a single call
await coordinator.generateText("Write a blog post about quantum computing");
// Coordinator will give research to researcher, writing to writer

:::important
Memory management and tool integration are the foundation of production-ready agents. Without these, you'll hit scaling issues quickly as your application grows.
:::

Debugging and Monitoring: Hooks System

One of my favorite features is also the visual console for debugging. I saw this approach for the first time in the framework world. But there's also a hooks system at the code level:

const hooks = createHooks({
  onStart: async ({ agent, context }) => {
    const requestId = `req-${Date.now()}`;
    context.userContext.set("requestId", requestId);
    console.log(`[${agent.name}] Started: ${requestId}`);
  },
  onToolStart: async ({ agent, tool, context }) => {
    const reqId = context.userContext.get("requestId");
    console.log(`[${reqId}] Tool starting: ${tool.name}`);
  },
  onToolEnd: async ({ agent, tool, output, context }) => {
    const reqId = context.userContext.get("requestId");
    console.log(`[${reqId}] Tool finished: ${tool.name}`, output);
  },
  onEnd: async ({ agent, output, context }) => {
    const reqId = context.userContext.get("requestId");
    console.log(`[${reqId}] Operation complete`);
  },
});

const agent = new Agent({
  name: "Observable Agent",
  // ... other config
  hooks, // Full traceability
});

This system is very valuable in production. You can trace every tool call, every agent interaction.

Voice Capabilities

Voice integration is also one of the features we added recently. We have both OpenAI and ElevenLabs support:

import { ElevenLabsVoiceProvider } from "@voltagent/voice";

const voiceProvider = new ElevenLabsVoiceProvider({
  apiKey: process.env.ELEVENLABS_API_KEY,
  voice: "Rachel",
});

const agent = new Agent({
  name: "Voice Assistant",
  instructions: "A helpful voice assistant",
  llm: new VercelAIProvider(),
  model: openai("gpt-4o"),
  voice: voiceProvider,
});

// Generate text response
const response = await agent.generateText("Tell me a short story");

// Convert to voice
if (agent.voice && response.text) {
  const audioStream = await agent.voice.speak(response.text);
  // Save audioStream to file or play it
}

Speech-to-text is there too, you can convert audio inputs to text.

VoltOps Platform Experience

npm run dev
# ══════════════════════════════════════════════════
# VOLTAGENT SERVER STARTED SUCCESSFULLY
# ══════════════════════════════════════════════════
# ✓ HTTP Server: http://localhost:3141
# VoltOps Platform: https://console.voltagent.dev
# ══════════════════════════════════════════════════

From the console you can do real-time conversation monitoring, tool execution tracing, memory state inspection, performance metrics, error debugging. Debugging has never been this fun.

The best part is, all these features are composable. You can use whatever combination you want - just memory, just tools, just voice, or all of them together. The framework doesn't force you into anything but everything is ready when you need it.

Real World Examples

Examples from the community are really inspiring. Like an e-commerce customer support bot:

const supportAgent = new Agent({
  name: "support-bot",
  instructions: "E-commerce customer support, can track orders",
  tools: [orderLookupTool, refundProcessTool, humanHandoffTool],
  memory: new ConversationMemory(),
});

This system achieved 35% less human escalation, 60% faster response time, 24/7 availability.

A developer made a repository analysis tool:

const codeAnalyzer = new Agent({
  name: "code-analyzer",
  instructions: "Analyze repository, make suggestions",
  tools: [githubConnector, codeQualityAnalyzer, documentationChecker],
});

Their feedback was: "I made a production-ready tool in 3 days, normally it would take weeks!"

A company also set up a RAG system for their documentation:

const retrieverAgent = new Agent({
  name: "document-finder",
  instructions: "Find relevant documents from vector DB",
  tools: [vectorSearchTool, rankingTool],
});

const responderAgent = new Agent({
  name: "answer-generator",
  instructions: "Create detailed answer using context",
  subAgents: [retrieverAgent],
});

Performance and Cost Reality

warning

LLM costs can escalate quickly in production. A single poorly optimized agent can burn through hundreds of dollars per day. Always implement cost monitoring from day one.

AI services are expensive, let's not forget that. But you can save serious money with the right optimizations. You can filter unnecessary tokens with smart context compression, you don't make API calls again for the same questions with response caching, you combine operations with batch processing.

My favorite feature is intelligent model selection:

const adaptiveAgent = new Agent({
  name: "smart-agent",
  model: adaptiveModel({
    simple: "gpt-4o-mini", // Simple tasks
    complex: "gpt-4o", // Complex reasoning
    coding: "claude-3-5-sonnet", // Code writing
  }),
});

Typical results are around 30-50% token savings.

Scaling challenges exist too of course. Memory management becomes difficult when thousands of agents run simultaneously, you need to be careful not to exceed provider API limits, the system should continue when an agent fails. To solve these, you need systems like connection pooling, circuit breaker pattern, automatic retry, graceful degradation.

Community and Ecosystem

The most valuable asset of frameworks is their community. Open source frameworks have these advantages: community contributions, transparency, customization freedom, no vendor lock-in. Commercial solutions offer professional support, enterprise features, SLA guarantees.

In Voltagent for example, MCP integration came from the community, now it's a core feature. Voice improvements, provider extensions, real-world examples - all community contributions.

Future Trends

Multi-modal agents are coming - text + vision + audio capabilities are combining. There's an autonomous learning trend - agents improving themselves. Agent-to-agent communication will become widespread, we'll see cross-organization agent networks. Edge deployment is also growing - lightweight agents running in browsers. No-code builders are developing for non-technical users.

Practical Tips

When choosing a framework, start small, test with a pilot project. Evaluate the community - how are documentation, support, examples? Think about migration path - how hard will it be if you need to change frameworks?

Build a simple chatbot first, then gradually add memory, tools, and multi-agent features. This approach helps you understand each component before building complex systems.

During development give specific instructions - not "do everything", but clear tasks. Apply single responsibility principle in tool design. Think about your memory strategy - how much context, how long? Don't neglect error handling - graceful failures, user experience is important.

When going to production don't forget monitoring setup - metrics, alerting, debugging. Keep API costs under control with rate limiting. Don't neglect security - input validation, output filtering. Do load testing, performance optimization for scalability.

Conclusion

The AI agent space is evolving rapidly. What you build today should be flexible enough to adapt to new models, capabilities, and paradigms that will emerge in the coming months.

LLM Agent Framework aren't just a technology. They're building the foundation of AI-first software development. By the end of 2025, every software company will have AI agents, LLM agent frameworks will become part of the standard development stack, multi-modal interaction will become normal, cost/performance ratio will improve dramatically.

To get started, research existing frameworks - Voltagent, LangChain, AutoGen and others. Try with a small pilot project. Read documentation, check examples, join communities. Test with real users.

This post is just the beginning. AI agent technology is developing so fast that in 6 months there will be new trends, new frameworks, new possibilities. What matters is being part of this transformation.

What is Vercel AI SDK?

Necati Özmen — Thu, 29 May 2025 09:10:01 +0000

We all want to add those smart, cool features to our apps, but sometimes the tech side war can get a bit much for all of us. That's where tools like Vercel AI SDK come in, and I wanted to share a few notes on how they can simplify things. When I first looked into it, the practical solutions it offered really caught my eye.

A Quick Look at Vercel AI SDK

So, in a nutshell, Vercel AI SDK is a library aimed at making it easier to build AI-powered user interfaces and apps. Its main goal is to make working with Large Language Models (LLMs) and other AI models smoother and more manageable.

Basically, instead of wrestling with complex APIs and endless configs, it offers a more developer-friendly approach.

When to Use What

If you're building simple AI features like a chat interface or text completion, Vercel AI SDK alone might be enough. But for more complex, autonomous agents that need memory and advanced decision-making, consider combining it with VoltAgent as we'll discuss later.

Of course, if your goal is to build more comprehensive, autonomous AI agents that can make their own decisions, then frameworks like VoltAgent are also worth checking out. I'll get to that in a bit.

What's Vercel AI SDK Got to Offer?

So, what makes Vercel AI SDK so interesting for us developers? Let's take a closer look at some of its standout features:

Broad Model Support and Flexibility
One of its biggest pluses, I think, is that it lets you work with popular model providers like OpenAI, Anthropic, Google Gemini, and Hugging Face through a single API. This saves you the hassle of dealing with different SDKs and integrations for each model.

This kind of standardization can be a real time-saver in development. It usually auto-detects API keys like OPENAI_API_KEY or ANTHROPIC_API_KEY that you define in your .env files or system environment variables and sets up the connection. Easy peasy.

:::note API Key Management
Vercel AI SDK will automatically look for environment variables like OPENAI_API_KEY or ANTHROPIC_API_KEY. Make sure these are properly set in your development environment or deployed application.
:::

Streaming and Ease of Use
You know how important streaming responses from AI is for user experience, especially in chat apps. Vercel AI SDK provides tools to make this easier.

It supports streaming not just text, but also structured data like JSON. Plus, if you're working with frameworks like Next.js, the React hooks and helper functions like useChat and useCompletion provided by Vercel AI SDK make building common AI interactions like chat and autocomplete on the UI side pretty straightforward.

Other Key Features:

generateText / streamText: These are the basic functions for text-based interactions and instant responses. Core stuff for Vercel AI SDK.
generateObject / streamObject: Super useful when you need to generate structured data (like JSON). It works integrated with schema definition libraries like Zod, so you can get the model to produce data in a specific structure. This can be a lifesaver, especially for data extraction or scenarios requiring formatted output. Keep in mind, support for these functions might depend on the capabilities of the underlying model.
Function Calling: Compatible models can call predefined external functions or tools, which seriously boosts the agents' capabilities. For example, an agent can fetch data from an external API or perform an action this way.
Multi-modal Support: It also supports models that can process inputs in different formats, not just text, like images. Vercel AI SDK passes these multi-modal message structures to the underlying model if it supports them.
Provider-Specific Options: Sometimes, even when you're using a higher-level tool like VoltAgent, you might want to use a very specific parameter offered by Vercel AI SDK or a specific model provider underneath it (like OpenAI). Vercel AI SDK gives you the flexibility to pass these provider-specific options (under the provider object) directly to the underlying SDK functions during calls. This means more fine-tuning and control for you.

Performance Tip

When using features like streamObject() with large response structures, consider implementing progressive UI rendering to maintain responsiveness, as the validation process might cause slight delays in complex response schemas.

So yeah, the speed, flexibility, and ease of use that Vercel AI SDK offers developers probably explain why it's become so popular.

VoltAgent: For Building More Advanced AI Agents

Now let's talk a bit about VoltAgent. While Vercel AI SDK makes interacting with LLMs easier, VoltAgent is a TypeScript framework designed for creating more complex and autonomous AI agents. With VoltAgent, you can develop agents that can perform specific tasks, make decisions, and interact with various tools.

Core Components of VoltAgent

At the heart of VoltAgent is the Agent class, which defines the agent's behaviors and capabilities. An agent basically consists of these components: instructions (defining the agent's purpose and behavior), an LLM Provider (managing communication with the model), and, of course, the specific model to be used.

There are also some additional features that make VoltAgent particularly powerful:

Tools: Allow agents to interact with the outside world, use APIs, or gather data.
Memory: Stores conversation history or important information to provide more consistent and context-aware interactions.
Sub-Agents: Allows complex tasks to be broken down and delegated to smaller, specialized agents.
Providers: These are the interfaces that define how VoltAgent communicates with different LLM services. And this is where our integration with Vercel AI SDK comes into play.

VoltAgent and Vercel AI SDK Working Together

The integration between VoltAgent and Vercel AI SDK is handled quite elegantly through the @voltagent/vercel-ai Provider. This provider acts as a bridge between VoltAgent and Vercel AI SDK, allowing VoltAgent agents to easily use Vercel AI SDK's core functions like generateText, streamText, and generateObject. If you're curious about the details, you can check out our documentation in the website/docs/providers/vercel-ai.md file.

So How Does This Integration Work in Practice?
When you create an Agent with VoltAgent, you use an instance of VercelAIProvider as the LLM provider and Vercel AI SDK's model definition functions (e.g., openai("gpt-4o") via @ai-sdk/openai) for the model. This way, model selection and management are done according to Vercel AI SDK's standards.

Below is a basic code example from our VoltAgent documentation that shows this integration:

import { Agent } from "@voltagent/core";
import { VercelAIProvider } from "@voltagent/vercel-ai";
// Model definitions come from Vercel AI SDK's respective packages
import { openai } from "@ai-sdk/openai";
// If you want to use a different model, for example Anthropic:
// import { anthropic } from "@ai-sdk/anthropic";

// An example agent using an OpenAI model via Vercel AI SDK
const agent = new Agent({
  name: "Vercel Powered Assistant",
  instructions: "This assistant uses an OpenAI model via Vercel AI SDK.",
  llm: new VercelAIProvider(), // The Vercel AI Provider
  model: openai("gpt-4o"), // OpenAI model defined with Vercel AI SDK
});

// Now you can call methods like generateText, streamText on this 'agent' instance
// using the Vercel AI SDK infrastructure.
// For example:
// async function testAgent() {
//   const response = await agent.generateText("Hello, world!");
//   console.log(response.text);
// }
// testAgent();

Installation Note

Don't forget to install both packages:

npm install @voltagent/core @voltagent/vercel-ai @ai-sdk/openai

And ensure you have the appropriate API keys in your environment.

As you can see in this example, an Agent can be easily configured using VercelAIProvider and Vercel AI SDK's model definition functions (openai, anthropic, etc.). This allows you to combine VoltAgent's agent capabilities with Vercel AI SDK's model variety and ease of use.

What Are the Advantages of This Integration for Us Developers?

Easy access to the wide range of models supported by Vercel AI SDK through VoltAgent.
Leveraging Vercel AI SDK's powerful capabilities for text and structured data generation within VoltAgent.
Easier integration of features like multi-modal support into VoltAgent agents with Vercel AI SDK's backing.
And of course, our documentation at Vercel AI Provider docs serves as a practical example of this integration.

Use Cases

We can think of a few scenarios where this integration can be practically useful:

Example 1: Chat Applications with Streaming Responses
If you're developing a chatbot for customer service or information queries, providing quick and streaming responses to user questions is crucial. By using VoltAgent with VercelAIProvider and leveraging the streamText feature, you can ensure that responses flow to the user instantly.

Example 2: Extracting Structured Data from Text
Let's say you need to extract specific information (like keywords from an article or technical specs from a product description) from long texts into a structured format like JSON. VoltAgent can help you automate such tasks by using Vercel AI SDK's generateObject capability and schema definition tools like Zod.

Common Integration Pitfall

When working with schema validation in generateObject, avoid overly complex nested schemas in your initial implementation. Start with simpler structures and gradually build complexity, as deeply nested objects can sometimes cause validation errors that are difficult to debug.

Also, as we mentioned in our Vercel AI Provider file, it's also possible to pass specific configuration options (provider-specific options) for Vercel AI SDK through VoltAgent if you need to. This gives you flexibility.

A General Assessment

In short, Vercel AI SDK offers a really useful toolkit for modern AI application development. It saves us all time by simplifying interactions with LLMs. VoltAgent, on the other hand, provides a platform to build more complex and autonomous AI agents on top of this solid foundation. The combination of these two tools offers us developers quite a wide range of possibilities for creating various AI solutions.

What's Next? And What's This About AI SDK 5

The Vercel team recently announced AI SDK 5 - a complete redesign of the SDK's protocol and architecture. Based on two years of real-world usage, they've rebuilt the foundation to better support today's more complex LLM capabilities.

What's New in AI SDK 5

AI SDK 5 represents a fundamental redesign based on real-world usage. The original protocol was designed when LLMs mainly generated text or tool calls, but today's models can generate reasoning, sources, images, and much more. The new protocol is designed to support these advanced capabilities and emerging use cases like computer-using agents.

Why the change? Simply put, the LLM landscape has evolved dramatically. Modern models do far more than just text generation - they reason, cite sources, create visuals, and even control computers. The old architecture wasn't designed for these capabilities, so a fresh start was needed.

warning: Migration Considerations

If you're already using Vercel AI SDK v3/v4 and planning to upgrade to v5, be prepared for breaking changes. The protocol has been completely redesigned, so you'll need to update your integration code. Consider creating a migration plan and testing thoroughly before deploying to production.

Building a Data-Aware Chatbot with VoltAgent and Peaka

Necati Özmen — Tue, 27 May 2025 06:44:29 +0000

Introduction

In this article, I'll demonstrate how we can use the Model Context Protocol (MCP) by integrating VoltAgent and Peaka to create an AI agent with data retrieval capabilities.

Refer to example project built in this post.

Wait, What's Peaka?

Right, before I show you the code stuff, let me tell you about Peaka.

Their idea is pretty simple: make it less annoying to work with data. Think of it like a data middleman. You hook up your databases, spreadsheets, whatever, to Peaka. Then you can ask it questions (using fancy SQL code or just regular English), and it pulls the info together from all those places for you.

Usually, connecting different data sources is a real pain and costs a lot. Peaka feels like a simpler option, especially if you're not a huge company or just don't want to mess with complicated data pipelines. They wanna be the easy button for getting data.

And VoltAgent?

It's our toolkit for putting together AI powered applications. We provide the core engine (@voltagent/core) to get you started, and then you can add extra capabilities, like voice interaction (@voltagent/voice) or support for different LLMs (OpenAI, Google, etc.). VoltAgent handles the complex stuff (like history and tool connections) so you can focus on your agent's unique features.

We designed VoltAgent to hit a nice sweet spot. It gives you more helpful structure than trying to build everything from raw AI libraries, but it offers a lot more freedom and customization than the simpler no-code platforms out there.

We also built the VoltAgent Console a web interface that lets you monitor your agents, see exactly how they're working, and chat with them directly. We find it incredibly useful ourselves for debugging and testing!

Making My Agent Talk to Peaka

Okay, so my plan was: build a chatbot with VoltAgent that could answer questions by checking data in Peaka.

To make these two talk, I used something called MCP (Model Context Protocol). It sounds fancy, but it's basically just a standard way for different programs to give each other tasks. If you wanna know more, I wrote about what MCP is over here.

For this project, it lets VoltAgent tell Peaka, "Hey, go run this data query!"

To follow along, you'll want to sign up for a free Peaka account first over at https://www.peaka.com/. For this example, I'm just using the sample data they provide, which you'll have access to once you sign up.

Here's how I did it.

1. Starting a New VoltAgent Project

First up, I needed a blank VoltAgent project. Their setup tool makes this easy:

npm create voltagent-app@latest my-peaka-agent
# Answer the questions it asks
cd my-peaka-agent

That just makes a folder with the basic files I need to get started.

2. Telling VoltAgent About Peaka (The MCP Bit)

This is where the magic happens. I had to edit the main code file (src/index.ts) to tell VoltAgent how to find and talk to the Peaka tool using MCP.

This is the key chunk of code I put in:

// src/index.ts

import { VoltAgent, Agent, MCPConfiguration } from "@voltagent/core";
import { VercelAIProvider } from "@voltagent/vercel-ai"; // Using Vercel's helper stuff for the AI
import { openai } from "@ai-sdk/openai"; // And using OpenAI's model

// 1. Set up the connection to the Peaka MCP tool
const mcp = new MCPConfiguration({
  id: "peaka-mcp", // Just a nickname for this setup
  servers: {
    // Here's the info for the Peaka tool
    peaka: {
      type: "stdio", // Means it runs like a command-line program
      command: "npx", // The command to start it
      // npx is neat, it grabs the latest Peaka MCP tool automatically
      args: ["-y", "@peaka/mcp-server-peaka@latest"],
      // Gotta give it my Peaka API key (stored safely elsewhere!)
      env: { PEAKA_API_KEY: process.env.PEAKA_API_KEY || "" },
    },
  },
});

// 2. Find out what the Peaka tool can actually *do*
// (Need this `async` stuff because it takes a moment to connect)
(async () => {
  // Ask the MCP connection: "What tools does Peaka give us?"
  const tools = await mcp.getTools();

  // 3. Create our actual chatbot agent
  const agent = new Agent({
    name: "Peaka Data Agent",
    instructions: "I can look things up in Peaka's data.",
    llm: new VercelAIProvider(), // Which AI service to use
    model: openai("gpt-4o-mini"), // Which specific AI brain
    tools, // <-- Super important! Give the agent the tools from Peaka!
  });

  // 4. Fire up VoltAgent
  new VoltAgent({
    agents: {
      // Make our agent live
      agent,
    },
  });

  console.log("VoltAgent is running with Peaka powers!");
})();

So, what's happening here?

MCPConfiguration: I'm telling VoltAgent, "There's this Peaka tool you can run. Use npx to find the @peaka/mcp-server-peaka thing, and give it my API key when you run it." The stdio part just means it runs like a regular program on my computer.
mcp.getTools(): This is the clever bit. VoltAgent starts the Peaka tool and then asks it, "What can you do?" Peaka sends back a list of its abilities (like querying data).
new Agent(...): I'm making the chatbot itself. I give it a name, tell it what AI brain to use (gpt-4o-mini), and crucially, pass in those tools I got from Peaka. Now the chatbot knows it has these extra data powers.
new VoltAgent(...): This just starts the main VoltAgent system with my new agent included.

Before running, I needed my API keys. I made a file called .env in the project folder and put them in there:

//".env"

PEAKA_API_KEY=your_secret_peaka_key
# Don't forget your OpenAI key!
OPENAI_API_KEY=your_secret_openai_key

(Use your real keys, obviously! Keep 'em secret!)

3. Running It and Asking Stuff

Okay, code's ready, keys are in place. Time to run it!

npm run dev

My terminal showed VoltAgent starting up, and it also started the Peaka tool automatically in the background. I saw something like this:

══════════════════════════════════════════════════
  VOLTAGENT SERVER STARTED SUCCESSFULLY
══════════════════════════════════════════════════
  ✓ HTTP Server: http://localhost:3141

  Developer Console:    https://console.voltagent.dev
══════════════════════════════════════════════════

Now the fun test:

I popped open the VoltAgent Console in my browser.
Found my agent ("Peaka Data Assistant").
Opened the chat window.
Asked it something that needed data from Peaka, maybe like:

"Hey, what was my Stripe balance yesterday?"

Here's the cool part of what goes on:

The chatbot AI gets my question.
It figures out I need data and sees it has that Peaka tool.
It decides to use the tool.
VoltAgent sends the request over to the Peaka tool (using MCP).
The Peaka tool does its thing, querying my actual Stripe data (or whatever I connected).
Peaka sends the answer back to VoltAgent.
VoltAgent gives the raw answer back to the chatbot AI.
The AI turns that raw data into a normal sentence and shows it to me in the chat.

Conclusion

Getting Peaka hooked up to my VoltAgent bot with MCP wasn't too bad! It's pretty awesome to have a chatbot that can actually use real-time data from different places. I can see this being useful for building smarter internal tools, helpdesk bots that know current info, or anything where the AI needs to know more than just what it was trained on.

Definitely worth playing around with if you're building AI stuff!

What is LLM Orchestration?

Necati Özmen — Mon, 26 May 2025 06:11:33 +0000

Introduction

If you look around, it's pretty much impossible not to have heard something about AI, especially these Large Language Models (LLMs), right? As if you knew these GPTs, Llamas, Claudes, and all that. As if these have already become part of our lives.

It's lovely to ask an LLM one question and get one answer. But how about giving it your entire customer support operation? Or asking it to handle a big research project from beginning to end? This is where a standalone LLM, no matter how smart, falls a little short. It's like having a super-powerful brain but no arms or legs.

Let me make an analogy

A single LLM is like a wonderful solo musician. It can perform wonders. But sometimes you require a symphony-an orchestra where various instruments play in coordination with each other in perfect harmony. That is precisely what LLM Orchestration is!

And right at this critical point, in comes LLM Orchestration. No more just whispering things to an LLM; it is making it talk to a bunch of other tools and data sources and even other LLMs to perform bigger, more complex, and more useful tasks.

In this post, we're going to break down this "LLM Orchestration" thing for you.

What's This LLM Orchestration Thing Everyone's Talking About?

Okay, we're tossing the term "orchestration" around and all that, but what is it, actually? Let me try defining it in the simplest way:

LLM Orchestration is basically the art of intelligently coordinating and managing one or more LLM calls with other third-party tools (whether it's a search engine, a database, or maybe an API you built yourself), data sources, and other software components.

So you hand an LLM and say: "Listen, pal, this is your assignment. But in order to finish off that assignment, you would utilize this tool there, fetch that data from there, then take the result and pass it on to this other LLM that will mold it like so."
It's all about instructional flow management.

Think about it:

An orchestrator is similar to a chef at a restaurant. They have great ingredients; the LLMs are amazing, yet they need to also direct the other tools in the kitchen, knives, ovens – our "tools" – and other cooks, possibly other services or LLMs, to prepare a delicious meal, the successful outcome. No one would work like this, right?

So, what is the key point here?

To break down the big and complicated problems into smaller, bite-sized pieces that LLMs can handle.
To enhance the wonderful language capabilities of LLMs with real world-knowledge and actions. And let's be honest, LLMs don't know everything or can't do everything. yet. -To build even more trustworthy, consistent-and-most importantly "stateful" (the ability to "remember" the situation) AI applications. That is, make systems that do not leave a conversation midway and say, "what are we talking about?" and can remember context. That's probably one of the most important points for me because in my very first experiences playing around with LLMs, that "memorylessness" really drove me nuts!

In short, thanks to orchestration, LLMs no longer remain simple machines that produce theoretical knowledge and become more sophisticated assistants capable of performing practical tasks. Is the picture clearer now?

But Why Bother? Aren't LLMs Good Enough on Their Own?

Now, some of you may ask, "Hey, aren't those LLMs quite already smart enough? Why bother with all these chains, tools and stuff, making things even more complicated?" Indeed, LLMs achieve incredible things on their own. However, real world problems may quite often result in a "devil is in the details" situation.

Some Key Points Where LLMs Alone Can Struggle and Orchestration is We Give It a Call "Must Have":

The Memory Issue and That Forgetfulness! They have a "context window." They can remember only a certain part of a conversation or text in their "mind." If the conversation gets a little too long, and the text to be analyzed is huge, they might forget the things at the very beginning. You know when you're telling your friend something and then five minutes later they're like, "What did you say again?" Sort of like that. I have to admit, I was quite disappointed when first I encountered this. I felt really like speaking with a different person each time..

- **What Orchestration Does:** That is where it comes in and manages the conversation history. If necessary, it summarizes the old data and reminds the LLM, or splits long texts into pieces, gets each piece analyzed separately, and then combines the outcomes together. In short, it expands the LLM's "memory."

Real-World Knowledge and the Up-to-Date Problem: "I Only Know Things Up to September 2021."

Most LLMs are trained up until a specific date. So, you cannot expect it to know about yesterday's headlines, the latest technologies, or your business's most recent product prices. In case you ask it, "What is the weather today?" it would probably say something like, "I do not know beyond my cut-off date."

What Orchestration Does: It connects the LLM to the external world!

This also feeds in the latest and freshest flow of information to the LLM via "tools" like search engines, news APIs, or databases within the company.

It can even further allow it to act by having the LLM act with such tools, such as sending an email or creating a calendar event. They have so cool names for this, like "Retrieval Augmented Generation," which, I think is one of the most revolutionary things.

Complex Tasks and Step-by-Step Thinking Ability LLMs are excellent at text generation, sure. But if you present them with a multi-step, complicated task such as "Make a business plan for me, analyze the risks for this plan, and prepare presentation slides," they can get stuck sometimes. Even if they complete each step flawlessly, they may not be able to link these steps together in a logical flow.

What Orchestration Does : Well, here come the "chains" and the "agents". The huge, hard job will be divided up into smaller, tractable sub-lets.

An LLM does its thing, that output feeds in to be an input for the next thing, maybe another LLM or a tool comes in at that point. That's what that factory assembly line did: each station did their piece, and at the end, this finished product. When first exposed to using agents, I felt like I had literally given the LLM a brain and a bunch of arms and legs!

Consistency and Reliability: "What Will It Say This Time?" LLM responses sometimes tend to be a bit. variable. You may get two entirely different answers if you ask the same question twice, once today and once tomorrow. While this may be a wonderful feature for creative tasks, it becomes quite a pain when you want consistency and accuracy.

What Orchestration Does: It can arrange for mechanisms that verify the outputs ensuring that, for example, the response from the LLM is in the right format or even ask the LLM to repeat the question with a different approach if the answer doesn't do justice. In other words, it tries to reduce those "I wonder" moments.

Cost and Performance: Every Click is Gold!

Operating LLMs, most especially the big and powerful ones, is not cheap, in the first place. Every API call is going to be an arm and a leg. If you are generating dozens of LLM calls just to wastefully do some task, both your bill increases, and your application slows down.

What Orchestration Does: it optimizes the calls. Either probably to some easy tasks it is dealing with a more trivial, rule based rather than querying the LLM. Or maybe it is caching results of oft used things so that it doesn't trouble the LLM again and again. In other words, it considers both your time and pocket.

Important Note:

LLMs are not magic wands. They are great tools, of course, but they are no panacea. The key to their successful use is knowing what they are good at and what they are not, and supplementing the weaknesses with intelligent approaches such as orchestration.

You can sense that orchestration is far from any "add-on." In fact, it's often a requirement to make LLMs actually powerful and useful applications. So, what does this thing called orchestration comprise? What are the building blocks that come together to create magic?

The Basics of LLM Orchestration: Chains, Agents, Memory, and More!

Excellent, we understand why orchestration is such a key thing. So, how does this system function? What are the fundamental building blocks? Now, let's have a closer look at some of the concepts you'll encounter most frequently, and these will be the crux of it all.

Chains: The Dance of LLM Calls

One of the simplest orchestration concepts is called "chains." You string multiple steps together as implied by its name. Those steps could be LLM calls, using a tool, or just about any form of data processing step.

Simple Chains: The most straightforward logic. Ask a question to an LLM, receive its answer, forward that answer to another LLM, receive its answer. and so forth. You could, for instance, first summarize some text and then convert that summary into keywords.
Smarter Chains: Add a bit of logic. For example: If the LLM's answer is 'yes,' do this; if 'no,' do that. Or you take multiple results generated by one LLM and tell another LLM, "Choose the best one among these."
One of my earliest "Aha! All I had just used a chain to get them to classify a user's request with some initial classification and then send it to a specific LLM for that classification. All I had done was get them to share the workload!

Agents & Tools: Letting LLMs Make the Decisions!

This is where things get really interesting. Agents transform LLMs from purely taking commands into creatures that think for themselves, decide how to apply which tool when, and make a plan to reach a goal.

What is an Agent? At the heart of an agent is an LLM. This LLM understands the task, thinks over what tools it has, and goes through a thought process very much like, "To achieve this task, I first need to do this, then I ought to use this tool, and with the output from that, I should do that."
What Can Tools Be? Anything you can imagine!
Web Search: Searching the internet for up-to-date information.
Calculator: For mathematical operations. (Yes, LLMs can sometimes mess up even simple math; a calculator tool is a lifesaver!)

Code Interpreter: See running the code that's generated by the LLM and the result it produces.

Database Querying: Retrieving information from your company's database.
API Calls: Using the API of any external service, such as the weather, maps, and calendar, etc.
Custom Tools: Your own custom-written tools that accomplish some specific task.
ReAct (Reason + Act) Pattern: Agents most often apply this very common thought pattern.

The LLM reasons, like: What should I do? Which tool should I select? Then it acts-that is to say, applies the tool. Then observes-looks at what the tool provides and repeats the cycle as long as needed to be done. When I first saw this, I felt like I was literally reading an AI's "inner voice." Very impressive!

Imagine an Agent

Suppose that you asked an agent, "Find the lowest fare flight ticket from Los Angeles to San Francisco for tomorrow and send it in my mail." Then the agent might think like this:

Thought: "I need to find a flight ticket. For this, I should use the 'flight search' tool."
Action: Utilizes the flight search tool to look for Los Angeles to San Francisco tickets for tomorrow's date.
Observation: Receives the results from the tool (a list of flights and prices).
Thinking: "I should select the cheapest one and send e-mail. I should select by price, select the cheapest, and then send the e-mail."
Action: Selects the cheapest ticket, passing this selection onto the send e-mail function.
And voilà! Ticket information right into your inbox. How cool is that?
Memory: No More "What Were We Talking About?"
As we said, LLMs have a short memory. "Memory" modules are there to solve this problem. They contextualize the LLM by keeping the record of a process or a conversation.

Conversation Memory
The most common type. This saves all the conversation with the user or important parts of it. Now the LLM can reference earlier steps in the conversation to say something like, "Regarding that X topic you brought up earlier."
Entity Memory: Pick up and store important entities mentioned in the conversation (names of people, places, products, etc.) and information related to them.
Knowledge Graph Memory: The LLM can make deeper inferences by storing even more complex relationships in a graph structure.
Long-Term Memory with Vector Databases: A bit more advanced, but really powerful. You can take any document, old conversation or text data of your interest and convert them into numerical forms called "vector embeddings" and store them in special databases such as Pinecone, Chroma, FAISS, etc. When an LLM gives a response to a question, the database offers the most relevant information, which is incorporated into the response from the database.

This is RAG (Retrieval Augmented Generation) itself! It becomes as if your own personal Google.

Data Ingestion & Retrieval (RAG): Feed LLMs Your Own Information

This is actually very close to the Memory topic, especially the vector databases part. RAG is the key to making LLMs talk to your own private data, company documents, content on your website, or any pool of information.

How the Process Works (Simplified):
Data Loading: You are loading your own documents (PDF, TXT, HTML etc) in the system
Chunking: The inputted data is subdivided into smaller bits.
Creation of Embeddings: The text is transformed into a numeric vector, which semantically interprets it with the help of an embedding model, itself an LLM.
Stored in Vector Database: These vectors and the pieces of text itself are stored inside the vector database.
When a User Query Comes In: - The user's question is also passed through the same embedding model, and a query vector is created.

This question vector is compared against the other vectors stored in the database. The words in the text that are semantically closest-that is, most relevant to the question-are located. These relevant pieces, combined with the user's original question, are input into an LLM as a "prompt."

The LLM utilizes the question and this additional information to create far more precise, contextually appropriate responses to your data.

When I first implemented RAG and had an LLM talk to my own notes, I was just blown away. It felt like the LLM became an extension of my brain!

Callbacks & Logging: What's Going On Behind the Scenes?

As the flows of orchestrations become more complex, questions like "What step are we on now?", "What was the LLM response?", "What tool did we use?" matter a great deal. Callbacks and logging enable you to trace this process, debug issues, and examine performance. Frameworks like LangChain have this built in.

Warning!

If you don't log a complex orchestration flow, finding the problem when something goes wrong is like looking for a needle in a haystack. Setting up a good logging strategy from the start is very important. Trust me on this!

A Quick Analogy

Think of an LLM as a super-smart intern. They can write, summarize, and even code a bit. But to tackle a big project, they need a manager (the orchestrator) to break down tasks, provide the right documents (tools/RAG), help them remember past conversations (memory), and ensure their final work is polished and useful (parsing/output formatting).

Dive Deeper: Key Orchestration Concepts in Action

Think of it like this: your LLM is a brilliant chef (GPT-4, Claude, etc.), but it only knows how to cook what's in its recipe book (its training data). If you ask for a dish using ingredients it hasn't seen (recent news, your company's private data), it might struggle or make something up.

RAG is like giving that chef a tablet connected to a massive, constantly updated grocery database and your personal pantry list. Here's a simplified flow:

You Ask a Question: "What were our Q3 sales figures for Product X based on the latest internal report?"
Retrieve Relevant Information: The LLM retrieves relevant information from the database or external sources.
Generate Response: The LLM uses the retrieved information to generate a response.
Output: The response is presented to the user.

What's Out There? Popular LLM Orchestration Framework

When it comes to bringing LLM orchestration to life, developers have several powerful frameworks and tools at their disposal. These tools aim to simplify the complexities of building, managing, and monitoring sophisticated AI applications. Popular open-source frameworks like LangChain and LlamaIndex provide comprehensive components for creating chains, managing agents, and implementing Retrieval Augmented Generation (RAG) pipelines. They are widely adopted and boast extensive community support and a wealth of examples.

Alongside these established players, other specialized tools and frameworks continue to emerge, each offering unique strengths. For developers particularly focused on TypeScript and seeking strong built-in observability from the ground up, VoltAgent presents a compelling option.

VoltAgent: A TypeScript Framework with a Keen Eye on Observability

VoltAgent is an open-source TypeScript framework specifically designed for building and orchestrating AI agents and LLM applications. It provides developers with the tools to create sophisticated workflows where LLMs can interact with various data sources, external APIs, and other services. A key focus for VoltAgent is its observability features. The VoltAgent Console allows developers to visualize the entire execution flow of their agents on an n8n-style canvas. This makes it significantly easier to debug, trace decision-making processes, monitor performance, and understand LLM costs associated with each step in an agent's operation. This visual approach to observability helps demystify the "black box" nature of complex LLM chains and agentic behaviors, making development and maintenance more manageable.

If you're looking for a modern, TypeScript-first approach to LLM orchestration with built-in visual debugging and tracing, VoltAgent is definitely worth exploring. You can find its documentation here and the project on GitHub.

Getting Your Hands Dirty: Tips for Starting with LLM Orchestration

Alright, theory is great, but how do you actually start building with these concepts? It might seem daunting, but here are a few practical tips:

Get a Strong Foundation:

How Do LLMs Work? It's very useful to know, at a basic level, what LLMs are, how they are trained, and what "prompt engineering" means. Explore the question, "How do I get the answer I want from an LLM?"
Python (or JavaScript): Orchestration frameworks that are super popular nowadays, such as LangChain, are usually based on Python or JavaScript. Familiarity with at least one of them will make your life much easier. Python seems to be ahead of other languages in this field regarding community support and library diversity.

Find a Problem to Solve (Or Make One Up!):

It often makes more sense to begin with a practical problem than to get mired in theory. A question might be: "Could I automate this annoying X task with LLMs and orchestration?" or "What if I made my own personal Y assistant?"

My first project was simple: an agent that would fetch new articles on my favorite blogs, summarize them for me, and rank them according to my interests. It wasn't that complicated, but it was enough to send me on a path of discovery of basic concepts!

Select an Orchestration Framework and Start Tinkering:

LangChain, which we mentioned earlier, is overall an excellent starting point for newcomers; their documentation is quite extensive, and they have numerous pre-written examples (cookbooks).
Installation: Go ahead and install the selected framework: in most cases, you begin with something such as: pip install langchain..

Write Your "Hello World": Create a chain that makes a very simple LLM call. Then gradually add a tool to it, get acquainted with a memory module.

Proceed Step by Step, Start Simple:

Don't try to build the most complex agent or the largest RAG system right away. First:
Create a simple chain: Ask the LLM a question, get its answer. - Add a tool: Make the LLM use a calculator or search the web. - Add memory: Make a chatbot that remembers the conversation history.
Try RAG on your own data: Upload a small text file through a tool like LlamaIndex and ask the LLM questions about this file.
Every small success will motivate you to take the next step. I also got confused at first by trying to do everything at once, but then things got easier when I said, "Hold on, let me just get this one step done first."

Review lots of sample code and watch tutorials

The official documentation for tools like LangChain and LlamaIndex is worth its weight in gold. They contain hundreds of sample codes and use cases.
- You can find countless tutorials and guides on those topics on YouTube, Medium, and any number of blogs. Searching for something like "LangChain tutorial for beginners" suffices.
Look at open-source projects on GitHub. How others use these tools is highly instructive.

Be Patient and Have Fun! Building with LLMs and orchestration is an evolving field. There will be trial and error. Embrace the learning process, celebrate small wins, and don't be afraid to experiment. The AI landscape is moving fast, and being hands-on is the best way to keep up.

Wrapping It All Up: The Future is Orchestrated

Yes, dear friends, we have started our journey into this exciting world called LLM orchestration. We found that it is not only a collection of cool technical terms but actually one of the keys to artificial intelligence becoming smarter, more capable, and useful in every area of life.

Now it is your turn. What are you going to do with this knowledge? What problem are you going to sweat and grind to solve? Probably you will begin with a tiny hobby, or maybe you will lay the foundation for the next big startup. Whatever happens, do not stop learning, trying, and, above all, dreaming.

What is LLaMA Factory? LLM Fine-Tuning

Necati Özmen — Wed, 21 May 2025 06:27:51 +0000

Large Language Models (LLMs) are gigantic AI models which generate text and code for a variety of tasks. Although such models are very powerful, however, they sometimes need to be tailored for specific purposes even more. Fine-tuning an LLM will accomplish this, but the process can be tricky without the right tools.

That's where I came across LLaMA-Factory, which made it much simpler for me to personalize the model.

What's the Big Deal with LLaMA-Factory?

Basically, LLaMA-Factory is just this totally awesome open-source thing by some great dev dude hiyouga. It's a one-stop-shop for fiddling with data from over 100 different LLMs and even VLMs (those are the ones which get visual). People love this thing. It doesn't surprise me. It takes some serious headache out of fiddling around.

It's also mostly platform-agnostic, meaning it gets along with models and datasets from the big boys such as Hugging Face and ModelScope.

What's Under the Hood? (Spoiler: A Lot of Great Stuff)

This is not boilerplate code; LLaMA-Factory is chock full of features. It's as though they thought of just about everything.

The Beat of the Beast: Models and Fine-Tuning Ability

A Whole Set of Models Seriously, it's an LLM smorgasbord: LLaMAs (all varieties!), Mistrals, ChatGLM, Qwens, Gemmas, DeepSeeks. and so on. If you've ever heard of it, LLaMA-Factory probably helps fine-tune it. When I needed to try out a newer, more obscure model, this was my source, and voilà! It was available.
Tune It Your Way – So Many Approaches!: It gets really interesting from here.
The Classics: You've got your standard Supervised Fine-Tuning (SFT) – my default, normally. Feel like taking a gamble? You can even attempt (Continuous) Pre-training.
- Fancy Preference Tuning: Familiar with PPO, DPO, KTO, or ORPO? They're high-falutin' techniques for matching models to human preferences or bespoke goals, and LLaMA-Factory makes them accessible. No longer coding it up from scratch – an enormous time saver.
QLoRA and LoRA to the Rescue: And then there are of course, QLoRA and QLoRA (Low-Rank Adaptation). They are life savers for reduced VRAM training. QLoRA, with its various bit quantizations (2, 3, 4, 5, 6, or 8-bit), enables you to train surprisingly large models on hardware that otherwise can't. I've seen fantastic results with 4-bit QLoRA!

Beyond the Basics: Efficiency, Usability, and the Full Toolkit

Lean, Mean, Tuning Machine – Efficiency is the Name of the Game:
- Full Power or Light Touch: 16-bit full-tuning is the choice if you can afford the horse power, or freeze-tuning for the light touch.
Smart Optimizations: It has the latest algorithms and real-world hacks like FlashAttention-2 and Unsloth for speed. For those who are serious about efficient training, there is support for techniques like GaLore (Gradient Low-Rank Projection).
Quantization Galore: Apart from QLoRA, it also supports other quantization techniques like AQLM, AWQ, and GPTQ, all in pursuit of the most compute bang for your buck.
No PhD Required (Mostly!): Sure, LLMs are complex, but LLaMA-Factory tries to simplify it. It has a command-line interface (CLI) that's fairly straightforward, at least with their sample configs. The actual gem for most, however? The LLaMA Board – web UI! You can essentially point-and-click your way through making a fine-tuning task. That's rather cool, huh?

More Than Training – The Whole Shebang

Task Flexibility: You can train for multi-turn dialogue, tool use, image understanding, visual grounding, video classification, audio understanding. it's very varied, ranging from LLMs to VLMs.
Keep an Eye on Things: Experiment tracking is built into packages like LlamaBoard, TensorBoard, Wandb, MLflow, and SwanLab. Seeing those loss curves decrease is so satisfying.
Showtime! (Deployment & Inference): Ready when you are, it offers faster inference modes, including an OpenAI-style API and support for workers like the vLLM worker or SGLang worker. You can also chat up your fine-tuned model in a hurry with llamafactory-cli chat your_model_config.yaml.

Okay, But Why This One? (The Good, and a Note of Realism)

Great observation! Yes, there are alternatives available, but LLaMA-Factory possesses this magic sweet spot:

It simply works: Be you a veteran ML engineer or just LLM-curious, it reduces the curve of learning. I've watched newbies get up and running on it pretty quickly.
Saves Your Sanity (and GPU budget): Efficiency focus is a massive win. Tuning can be computational hell, and anything that keeps it in line is a winner to me.

Pro Tip: Stay on the Cutting Edge!

Hi, check it out - LLaMA-Factory is always completely up-to-date! The coders jump on new models right away, and since it's open-source, individuals continually enhance it. Pretty cool, eh? You're basically getting the latest technology without all the hassle!

Want to Try It Out? Getting Your Hands Dirty with LLaMA-Factory

Alright, convinced enough to give it a shot? Or simply interested in it? Here is an absurdly abbreviated overview of how to begin.

Check Out the Specs & Get the Goods (Installation)

First, check the LLaMA-Factory GitHub for their hardware requirements table (GPU, RAM, etc.), as requirements vary with model size and tuning process.
Next, clone their GitHub repo:

  git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git`

Next, cd LLaMA-Factory
pip install: It's Python, so use a virtual environment (future you will thank you, trust me). Then pip install -e ".[torch,metrics]" is where you start. They also have extras options, e.g., bitsandbytes for QLoRA, or vllm for fast inference.
Alternative - Docker: If Docker is your thing, they've got Dockerfiles! Look at the docker directory in their repo for configurations for CUDA, NPU, and ROCm. This can simplify environment management.
- Their README has all the install options for your target OS and hardware.

A Note on Production Scale
A Grain of Reality: LLaMA-Factory is wonderful for experimenting and tuning, and even comes with deployment APIs, but scaling a model to a super-scalled, high-load production environment with lots of traffic might still require some more, special MLOps tools and some additional, manual tuning on top of what LLaMA-Factory can provide out of the box. It gets you really far, but it is worth noting for high-scale deployments.

Feeding the Beast (Data Preparation)

Your data needs to be LLaMA-Factory-readable format, usually JSON files. You might have customer support dialogs to learn from, or product descriptions to write in some specific witty tone.
One key file to note here is data/dataset_info.json. You'll edit this to tell LLaMA-Factory about your own custom dataset – where it is, what format it's in, etc. It supports local datasets, Hugging Face datasets, and ModelScope Hub content.
Their data/README.md is read-once for this step. It specifies formats and has example datasets to show the structure.

Let's Get Tuning! (Running a Job)

The CLI Way: For users who love the command line, you'll typically run fine-tuning via the llamafactory-cli tool. It might look something like this:

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

That sorcery is in that .yaml file. LLaMA-Factory have plenty of sample YAML config files within their examples directory (e.g., LoRA SFT on Llama 3, or DPO on Mistral). They're great to use as a starting point. You simply copy one, adapt it to your model and data, set your hyperparameters (learning rate, epochs, batch size), and off it goes.

The LLaMA Board (Web UI): If YAML files make your eyes squint, or you prefer getting through a GUI, get the web UI up and running!

llamafactory-cli webui
This puts up a Gradio interface where you select your model, dataset, fine-tuning method, and parameters using dropdowns and input fields. Good for experimenting and learning the options, especially if you're new to this.

Quick Chat After Fine-Tuning: Once tuned, attempt quickly with: llamafactory-cli chat path_to_your_finetuned_model_or_adapter_config.yaml

"Where's the Real Full Manual?" (Documentation is Your Friend)

The real treasure map is the official doc: https://llamafactory.readthedocs.io/en/latest/. Bookmark it. Seriously.
And don't overlook the examples directory in the GitHub repository. It's packed with scripts and configurations. I catch myself going back to them often.
Struggling? GitHub Issues have answers, or you can ask your own question.

Taking It Further (Advanced Bits)

Once you have a model you're happy with, share or use it more widely. LLaMA-Factory helps you with that too:

Exporting Your Model: They provide an export_model.py script (or llamafactory-cli export your_config.yaml command). Convenient to merge LoRA adapters into the base model for an independent fine-tuned model.
Hugging Face Hub Sharing: Once exported, it's simple to share your new model on the Hugging Face Hub. The exported format is largely compatible.

So yea, that's LLaMA-Factory in a nutshell, with a bit more on how you'd actually get up and running with it. If you're interested in getting your toes wet in the waters of LLM fine-tuning and need a tool that's powerful, agile, and won't cost you a kidney to be able to afford compute time (almost!), then you should definitely give it a look.

Dive Deeper

The Source of All Goodness (GitHub): hiyouga/LLaMA-Factory
The Manual (Docs): llamafactory.readthedocs.io
The Brainy Paper (ACL 2024): LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models (For when you feel like being extra smart!)

What is Multimodal RAG

Necati Özmen — Mon, 19 May 2025 08:27:35 +0000

There's a term out there in the AI community today that everybody's using: Multimodal RAG.

Let's Start from Scratch: What's This RAG Thing Anyway?

Let's get this "RAG" out of the way before we dive into Multimodal RAG. RAG stands for Retrieval-Augmented Generation. Think of it like this: "Go find some helpful info, mix it with what you know, and then give me a smart answer."

Here's the deal - you know those AI chatbots like ChatGPT that seem super smart?

Well, they've learned tons of stuff, but they have some problems. Sometimes their knowledge is outdated, or when you ask about something really specific, they just shrug and say "Sorry, no idea." That's where RAG comes to the rescue. Before the AI answers you, it acts like a detective.

It searches through fresh databases, company documents, maybe even the internet, to find the most up-to-date and relevant information about what you asked. Then it blends this new info with what it already knows and gives you a much better, more current answer. Pretty cool, right?

Okay, But Is Text-Enough? What's With Old-School RAG?

Traditional RAG systems primarily work with text. The query is text, the information it retrieves is text, and the answer it generates is text. This approach works well for many scenarios, but it has limitations. Our world contains much more than just text. Think about diagrams in instruction manuals, charts in presentations, or important details in medical images.

These visual elements contain valuable information that's difficult to capture in words alone. This is where traditional text-based RAG systems fall short - they simply can't process or understand non-textual information effectively.

And Stage Left Enters: Multimodal RAG

Just at the moment when we can say, "Text is not enough!" Multimodal RAG comes storming in. "Multimodal" means simply "many channels" or "many forms." Thus, this friend here is not all text; it's a RAG system that is aware of and uses images, audio, video, those less-than-famous Excel spreadsheets, graphs, etc. Just like us! When we learn, we read, we scan through images, we watch videos, don't we?

All because of this new-gen RAG, AI is now able to "read" that graph in the PDF you uploaded ahead, "see" that hairline scratch on the product photo, and "hear" that critical emphasis in the record of that meeting. Then it gathers all this data tweezed out from everywhere and gives you an answer so comprehensive, it's like it had a better grasp of the subject than me or you.

Multimodal RAG Explorer

Why Should I Care About Multimodal RAG?

Okay, so you're thinking now, "Sounds cool, but what am I gonna do with it?"

That's all around us is not just bland plain text. Slideshows, articles, tweets, scientific papers. They're a complete mess: there's text, there's pictures somewhere, a video somewhither and there's another graph somewhere.

And that is where the alchemy of Multimodal RAG begins:

Deep Water Swimming: Instead of just reading through the text on the surface, it recognizes objects from pictures, reads graphs for trends, and works out hidden relations in tables. Thus, it does not scratch the surface but goes deep.
Answers Are More on Point: Particularly for questions that require visual or audio information (such as "What is the car model in this photo?"), it can provide on-the-dot answers since it handles that data natively.
Making Complex Stuff Simple: It can take those pages and pages of number-laden tables nobody wants to read, or those immensely complex diagrams, and say, "Buddy, here's the bottom line."
Using with Real-World Data: It handles mixed-format data, just like the real world displays, much more effectively.

Isn't There a Catch? What Are the Hard Places?

Of course, every rose has a thorn, and every tech has a "but" or two. Constructing Multimodal RAG is not quite a walk in the park:

Every Modality is an Individual Things: Text analysis is one thing, interpreting an image or decoding a sound recording something else again. Every form of data has its quirks. Think about it, achieving that "vibe" in a holiday photo is different from being able to pick up a millimeter detail on an architect's plan, isn't it?
Combining Different Worlds: The integration of diverse data modalities presents a significant technical challenge. Effectively synthesizing semantic information from disparate sources—such as correlating graphical elements with their textual explanations—requires sophisticated algorithms that can create coherent, unified representations while preserving the unique contextual value of each modality.

So How Do the Pros Construct These Systems? Basic Strategies

Our engineering wizards have gotten together and come up with a few broad strategies:

"All for One" (Common Embedding Space): All data types get thrown into one shared mathematical space using clever models like CLIP. It's like putting apples and oranges in the same fruit basket to compare them. Makes using existing RAG stuff easier, but you need beefy models that can handle all the details.
"Let Me Speak Your Language" (Translating One Modality to Another): Just turn everything into text first. Like, "This image shows a cat on grass under clear sky." Then feed these descriptions to a regular RAG. Works when text does the job, saves you from building new models. Downside? You lose some of the original image's magic.
"Everyone on Their Own Team, Meet in the Finals" (Separate Stores and Re-ranking): Use different storage boxes for different data types. When asked something, each box grabs its best stuff. Then a smart filter (re-ranker) picks the most relevant bits. Makes specializing easier but adds complexity at the filtering stage.
"The Mixtape" (Hybrid Approaches): Mix and match these approaches for best results. Like making your own custom playlist.

What Do We Need if We're Building a Multimodal RAG?

For a Multimodal RAG system, generally, one must use the following on the pitch:

Multimodal Large Language Models (MLLMs): These are the operation's brainy super-geniuses. Special LLMs that can understand text, images, sound, etc., and generate useful responses from all of them. When you see names like LLaVa, GPT-4V, Qwen-VL, realize that these are what they refer to.
Embedding Models: These are the translators. They convert text or images into a semantic equivalent that can be used by computers (i.e., vectors full of numbers). CLIP and Sentence-BERT are the masters of this trick.
Vector Databases: Special storage that stores these numerical equivalents (vectors) and allows us to query through them at lightning speed. Think Chroma DB, Milvus, FAISS.
Data Parsing/Extraction Tools: You see those PDFs, Word documents, etc.? Those are little programs that pull out the text, images, and tables from them. Unstructured.io, for example, is quite good at it.
The Orchestra Conductor (Orchestration Tools): Tools that orchestrate the workflow and make all these different pieces play together in harmony without stepping on each other's toes. LangChain is a widely used conductor for that.

Let's Get Practical: Step-by-Step, How Does Multimodal RAG Work?

Theory aside, if you ask, "How does this stuff actually work in real life?" it generally goes like this:

The Warm-Up (Data Preprocessing):

Extraction Operation: Texts on one side, images on the other – they're separated from the documents we have (like those notorious PDFs).
Who's Who? (Classification): Images are taken into consideration and a categorization is created, e.g., "Is it a graph, or is it a picture of our friend Necati's holidays?"
Giving Images a Voice (Summarization/Captioning): Short descriptions are generated for images, e.g., "This image has X, doing Y." Especially for graphs, models like DePlot can translate the figures and lines to text.
Everybody Gets an ID (Embedding): Semantic ID cards (embeddings) in machine-readable form are generated for images (or their image descriptions) and text passages.

Storage Strike Time: These created ID cards (vectors) are dumped into a dedicated vector database where they can be quickly found when needed. Every now and then, a reference to the original image file is also stored, labeled "True Copy."
Question In, Brains On! (Retrieval and Generation):
- The user's query is taken in, and an ID card (vector) is created for it too.
- The database is queried and the most appropriate text and/or image IDs are fetched and called in along with their respective owners.
- Those most suitable texts and (if any) images are provided to our super-brain Multimodal LLM (MLLM) as "Here's your material."
- The MLLM uses the question and this extensive content laid out before it to create a rich, satisfying answer for the user. If it is simply inquiring outright about an image (like "How many individuals are in this photo?"), then the MLLM flaunts its visual question answering (VQA) capabilities.

Uses Cases of Multi Modal Rag

Actually, you should be asking, "Where can't we use it?" But for a couple of well-known examples, here they are anyway:

Reports Such as Mixed Nuts: Perfect for breaking down those large reports, financial reports, or market research reports that contain graphs, tables, and lots and lots of text.
Seeing Chatbots: Smart aides that are able to answer questions like, "What is the function of this button on the screen?" or "What type of architecture is this in the picture?" – ones that are able to see the same thing you do.
Dancing with Manuals: You recognize those instruction manuals full of pictures, or user guides for high-tech devices? Guiding users who ask, "Where do I put this screw?" through them immediately.
News from the Sectors: Familiarizing physicians in health care with X-rays, identifying patterns from live stock market charts in finance, providing interactive, multi-channel course materials for students in education. And many more!

What's on the Horizon? Where Is This Headed?

This Multimodal RAG topic is still super fresh. But it's already giving us a glimpse of what we can expect in the future:

Information will be retrieved not just by typing, but by asking, "Hey assistant, who is the person in this picture?" or sending an audio file.
AI output won't be text anymore either. Maybe it'll graph something out for you, or possibly show you what it's talking about in a picture.
We'll see "multimodal agents" – systems that can plan and execute much more advanced tasks in a step-by-step manner, working with various forms of data simultaneously.
Complaints like "This image resolution is too low, I can't make anything out" will recede, as AI becomes better at understanding all sorts of visuals.

LLM Observability: Beginner Guide

Necati Özmen — Fri, 16 May 2025 06:41:09 +0000

Introduction

Alright, so you're playing around with LLMs – maybe building something cool. But here's the thing: getting them to work reliably? That's the tricky part. Sometimes they give you exactly what you want, and other times the output makes no sense at all. When it goes wrong, how do you figure out why?

That's where LLM Observability comes into the play.

Yeah, "Observability." Sounds like another one of those tech terms, doesn't it? Maybe a bit overused. But look, if you want to build AI stuff that actually works, that doesn't break in weird ways, and that you can actually understand, then we gotta pay attention to this.

What's the Big Deal with LLM Observability Anyway?

Think about the software you usually build. When something goes sideways, you've got logs, metrics, traces – a whole set of tools to figure out what broke. You can see what's happening.

They can often feel like a total black box. You feed them a prompt, they do some incredible (and often mysterious) internal processing, and out pops an answer. But what actually happened in between?

LLM observability is all about getting those crucial insights.

It really boils down to figuring out:

Why on earth your LLM said what it said.
How well it's actually performing (or not performing).
Where things might be going wrong.
And yeah, how much all this magic is costing you.

Why You Absolutely Need LLM Observability

Okay, so it helps you see inside the LLM. Sounds good. But what does that really mean for you and your cool new AI project? Let's break it down into why this isn't just a nice-to-have, but a must-have:

No More Hair-Pulling Over Bugs: Your LLM gives a weird answer, "Where did that come from?" Observability tools help you trace the problem, whether it's a bad prompt, an issue with the data it's using, or the model just having an off day.
Build AI People Can Actually Trust: Let's face it, users want AI that feels dependable and makes sense. By keeping a close eye on your LLM's behavior, you can ensure the quality, safety, and fairness of what it puts out. That's how you build trust.
Keep Your Wallet and Your Watch in Check: LLMs can use a lot of resources. Good observability lets you track critical things like token usage (which hits your budget directly) and how fast your model is spitting out answers (latency). Nobody's a fan of a slow or surprisingly pricey app.
Catch Problems Early: Models aren't static; they can "drift" over time. This means their performance can degrade or change in unexpected ways. Solid observability helps you spot these shifts early, so you can tweak, retrain, or adjust before it becomes a major problem.
Happier Users, Happier You: At the end of the day, understanding your LLM better leads to a better product and a smoother ride for your users. And happy users? That's the name of the game.

The Core Pieces: What Should You Be Watching?

Alright, convinced? Ready to get this "observability" thing sorted? The next question is, what exactly should you be keeping an eye on? It can seem like a lot, but here are the main parts:

Prompt & Input Tracking: Know Your Starting Point

- What kind of prompts are people (or your system) actually sending? You'd be surprised!
- Are there patterns? What makes a prompt successful? What makes one fail badly?
- And importantly, is anyone trying to be sneaky and trick your LLM with "prompt injection"? You need to log them to understand them.

Output & Response Monitoring: What's It Actually Saying?

- Definitely log the text the LLM generates.
- But also, how good _is_ it? Think about relevance, if it makes sense (coherence), and super importantly, if it's saying anything problematic – like toxic language or generating incorrect information (often called "hallucinations").

Following the Breadcrumbs (Especially for Agents/Chains): If your LLM isn't working alone – maybe it's part of an agent that uses tools or follows a chain of thought – you'll want to see those intermediate steps.

- Which tools did it decide to use? And why?
- What was its internal "reasoning" process (as much as we can see it)?

Performance Check-Up: The Vital Signs

- **Latency:** How long are users waiting for a response? Too long, and they're gone.
- **Throughput:** How many requests can your setup handle? Planning for scale is key.
- **Token Counts:** How many tokens are being used for inputs and outputs? This one's important for cost!

Counting the Beans: Cost Tracking Seriously, keep an eye on those API bills.

- How much is each request, or each user, or each feature _really_ costing?
- Can you spot any features that are surprisingly expensive?

Listen to Your Users: The Feedback Loop Your users are an incredible source of truth. Make it easy for them to tell you what they think.

- Simple thumbs up/down on responses can be very valuable.
- What are they saying about the AI's helpfulness in general? Are they getting what they need?

Grading the Model: Is It Doing Its Job Well? Beyond individual responses, how well is the model actually doing its job over time?
- Track accuracy scores or other relevant metrics.
- Use specific evaluation datasets and benchmarks to see how it compares.
- And always, always be on the lookout for model "drift."

Phew! That does sound like a lot, doesn't it? But here's the good news: you don't have to do everything at once. Start with what feels most critical for your specific app and build from there.

The Not-So-Easy Parts: Why LLM Observability Can Be Tricky

Now, if getting full observability on LLMs was easy, everyone would have it all figured out already. The truth is, there are some unique problems that make it a bit more challenging than your average software:

"Good" Can Be Super Subjective: What makes a "correct" or "high-quality" answer from an LLM? Sometimes it's obvious, but often it's pretty fuzzy and depends heavily on context. This makes setting up automated quality checks a real challenge.
It's an Ocean of Information: LLMs process large amounts of text – tons and tons of it. Logging, storing, and then actually analyzing all that data can be a big job.
The Need for Speed (Real-Time): If your LLM is interacting with users live, you often need to monitor what's happening in real-time. Catching issues as they happen is crucial.
Still Developing: Honestly, standards for LLM observability are still developing. Different models and platforms do things their own way, so it's not always plug-and-play.
Privacy is Paramount: Prompts and responses can easily contain sensitive, personal information. You've got to be incredibly careful and responsible about how you log, store, and protect this data.

Your Toolkit: How to Actually Do LLM Observability

Alright, enough about the challenges – let's talk solutions! What tools and techniques can you actually use to get a better handle on your LLMs?

Good Ol' Logging, But Smarter: Don't throw out your existing logging practices! Adapt them. Standard logging frameworks are your first line of defense, but make sure you're capturing LLM-specific stuff like prompts, full responses, token counts, and maybe even those intermediate thoughts if you can get them.
Tracing the Journey (Hello, OpenTelemetry!): For anything more complex than a single LLM call (think microservices, or chains of LLM calls), distributed tracing is your best friend. Tools like OpenTelemetry can help you see the entire lifecycle of a request as it bounces between different parts of your system.
Vector Databases to the Rescue: These are becoming super handy. You can store embeddings of your prompts and responses, then search for similar ones. This is great for spotting common issues, finding anomalies, or even powering some clever automated quality checks.
Dashboard Superstars (Prometheus, Grafana, Datadog, etc.): If you're already using platforms like these for your other apps, you can often hook them up to your LLM data too. They're awesome for visualizing metrics, creating dashboards, and setting up alerts when things go weird.
So, How Do You Grade an LLM, Anyway? (Evaluation Techniques):
- Human Review is Often King: Sometimes, you just need a human to look at the output and say, "Yep, that's good," or "Nope, that's way off."
- Model-Based Eval (AI Judging AI): Using another LLM (or a simpler, more focused model) to score the output of your main LLM. It's not perfect, but it can help scale your checks.
- Run it Through Benchmarks: Test your LLM against standard datasets and benchmarks to see how it compares against others or its own past performance.

Case Study: VoltAgent's Visual Approach to Observability

Speaking of specialized platforms, it's insightful to see how observability can be a foundational principle. When we were building VoltAgent, we ran smack into the same "black box" problem many developers face with AI agents.

Our biggest frustration? It was just so darn hard to understand why our agents made certain decisions. What steps did they take? Which tools did they pick, and when? And when an error inevitably popped up, figuring out exactly what went wrong felt like detective work without enough clues. Standard logs helped a bit, but they just weren't cutting it as interactions got more complicated.

We got a lot of inspiration from the visual debugging power of tools like n8n. We thought, "Why can't we have something that clear for AI agents?" So, we decided to build observability right into the core experience. The key differentiator for VoltAgent became our VoltAgent Console. This console isn't just another dashboard; it lets you visualize the entire lifecycle of your agents—we're talking LLM interactions, tool usage, state changes, even their internal reasoning—all laid out on an n8n-style canvas.

With this kind of visual approach, you can suddenly do things like:

Clearly see the step-by-step execution flow your agent actually takes – no more guessing!
Debug errors much, much more easily by pinpointing exactly where things went sideways on the canvas.
Track your agent's performance and, crucially, LLM costs tied to specific steps in the flow.
Easily compare results and execution paths when you're experimenting with different LLMs or tweaking your prompts.

Our whole goal with this visual, canvas-based observability is to make the agent "black box" transparent and understandable. If you're curious to see this approach in action, you can check out the VoltAgent documentation.

Getting Started: Best Practices for Sanity (and Success!)

Ready to actually start implementing? Awesome. Here are a few tips to make your LLM observability journey a bit smoother and save you some difficulties down the road:

Know Your "Why" (Seriously!): Don't just start logging everything because you can. That's a quick way to get overwhelmed. Ask yourself: What questions are you really trying to answer? What problems are you desperately trying to solve? Start with clear goals, and let those guide your efforts.
Start Small, Then Grow Smart: You don't need to implement every single main part of observability on day one. Begin by logging the data points that are most critical for your app. You can always add more layers and sophistication later as you get a better feel for what you need.
Figure Out What's "Normal" (Establish Those Baselines): You can't spot problems if you don't know what "good" or "normal" actually looks like for your specific LLM setup. Track your key metrics over time to get a feel for your baseline performance and cost.
Build It In, Don't Just Add It Awkwardly Later: If you can, try to think about observability from the very start of your project. Weaving it into your development lifecycle early is a lot easier (and usually more effective) than trying to force it in after everything's already built.
Humans + Machines = The Dream Team: Automated monitoring is fantastic, and you should use it. But don't forget the human element. Combine those automated checks with human oversight and evaluation, especially for the really nuanced stuff like output quality and fairness.
Handle Data with Care (Especially the Sensitive Stuff): This one's huge. If you're logging prompts and responses (and you probably should be), make absolutely sure you're anonymizing, redacting, or otherwise protecting any personal or sensitive information. Seriously, don't mess this one up.

Wrapping It Up: Observability is Your LLM's Best Friend (Really!)

So, after all that, is LLM observability just another thing to pile onto your already massive to-do list? Well, yeah, it kind of is. But here's the important point: it's one of those things that can genuinely save you a ton of difficulties, a surprising amount of money, and a whole lot of user frustration down the line.

When you start to truly understand what your LLM is doing and why it's doing it, you shift from being a hopeful operator just crossing your fingers, to a confident architect of intelligent systems. You get the power to build AI applications that are more reliable, more efficient, and, importantly, more trustworthy.

🌟Top 5 AI Agent Frameworks in 2025🌟

Necati Özmen — Thu, 08 May 2025 08:11:04 +0000

AI is evolving fast. AI agents are a key part of this growth. These are AI programs that can reason, plan, and use tools to achieve goals. Building them from zero is hard. Luckily, frameworks and innovative projects are coming out. They handle many complex parts, so developers can focus on the agent's logic.

Many tools are available now. Which ones should you look at? We explored several options. Here are 5 that stand out.

1. LangGraph (Python)

LangGraph is built on the LangChain ecosystem. It focuses on making stateful applications with multiple actors (like agents) using a graph approach. These workflows can manage loops and branching logic.

LangGraph Offers:

Graph-Based Structure: Define agent steps and logic as nodes and edges in a graph.
State Management: It handles keeping track of the agent's state during the workflow.
Supports Complex Logic: You can build workflows with loops, conditional logic, and human intervention points.
Leverages LangChain Ecosystem: You can use all the tools and integrations from LangChain.

⭐ Source: LangGraph

2. VoltAgent (TypeScript)

It's an open-source TypeScript observability-first framework. It's for better developer experience for building AI agents.

Code gave us flexibility. But we lost the clear visual view that tools like n8n had. We tried standard AIOps tools for observability. They helped, but didn't give the same clear, step-by-step view of the agent's execution flow.

VoltAgent Offers:

TypeScript/JavaScript Native: It works directly with Node.js/Web code.
Open-source & Code-First: You get full flexibility and control because you write the code.
Core Building Blocks: It provides essential parts like tools, memory management, and state handling.
LLM Agnostic & Multi-Agent: It works with different large language models (LLMs). You can also coordinate multiple agents.
Visual Debugging Console: This is a key feature. You connect it locally to your running agent. It shows you visually how the agent thinks step-by-step. You can inspect messages and see the execution flow. It brings the clarity of visual tools to your coded agents. Your data stays on your machine.

⭐ Source: VoltAgent

3. CrewAI (Python)

CrewAI is a framework made for orchestrating autonomous AI agents. Its goal is to help them collaborate to achieve goals. It focuses on letting you define roles, tasks, and processes for a "crew" of agents. This makes it easier to build complex workflows where specialization and collaboration are important.

CrewAI Offers:

Role-Based Agent Design: Create agents with specific roles, backgrounds, and capabilities.
Task Management: Assign clear tasks for agents to complete.
Process Orchestration: Set up how agents work together (like doing tasks in order or in parallel).
Tool Integration: Give agents access to external tools easily.

⭐ Source: CrewAI

4. AutoGPT (Python)

AutoGPT is a very influential open-source project. It's not a framework to build other agents in the usual way. It's more like a finished AI agent program that shows what a truly autonomous agent can do. It tries to achieve goals you define. It does this by breaking them down, managing its own prompts, using tools, and learning by trying. Its architecture and code have inspired many other agent projects.

AutoGPT Offers:

Autonomous Goal Achievement: It's designed to operate with minimal human help once given a goal.
Task Decomposition & Management: It tries to break big goals into smaller, manageable steps.
Extensive Tool Use: It can use many tools like Browse the internet or running code.
Inspiration & Learning: It's a concrete example of a complex autonomous agent. You can study its open code.

⭐ Source: AutoGPT

5. AutoGen (Python)

AutoGen is developed by Microsoft. It lets you create multiple agents that can converse with each other. They talk to work together and solve tasks. It focuses on building systems where agents interact through conversational patterns. Humans can often be included in the loop.

AutoGen Offers:

Conversational Agents: It's made for systems where agents collaborate via chat.
Flexible Conversation Patterns: Supports different ways agents talk and work together.
Code Execution & Tool Use: Agents can run code and use tools within the conversation.

⭐ Source: AutoGen

Choosing the Right Tool

The best choice for you depends on your project needs. It also depends on your preferred programming language (TypeScript/JS or Python) and the complexity of your agents.

Trying them is the best way. Read their docs. Build a simple agent with a couple of them. See which one fits your workflow and project goals best.

Are there other tools you think should be on this list? Share your thoughts and experiences in the comments.

Implementing your first MCP: A Google Drive Chatbot 🤖

Necati Özmen — Mon, 05 May 2025 06:52:39 +0000

Introduction

We're excited to share something cool we've put together: a chatbot that can actually search through your Google Drive files.

To build this we'll use VoltAgent, along with some neat tools like Composio and the Model Context Protocol (MCP). If those names sound a bit technical, no worries, we'll explain everything as we go.

What's the Goal Here?

Imagine asking a chatbot, Find my presentation about Q3 results, and bam, it digs through your Google Drive and gives you the link.

That's exactly the kind of thing we wanted to enable an AI agent that can securely connect to your personal tools, like Google Drive in this case.

The Tools We Need

To make this chatbot work, we needed a few essential pieces:

1. VoltAgent: Our Agent Framework

So, first things first: VoltAgent. This is our baby, an open-source TypeScript framework we built specifically to make creating AI agents less painful. We've all been there, trying to cobble together different AI models, memory systems, and tool connections. It gets messy fast!

VoltAgent provides a structured way to do this. We designed it to be modular, so you can easily swap AI models (like GPT-4), manage how the agent remembers things, and crucially for this example hook up external tools and data sources, like Google Drive.

Our goal was to help developers (including ourselves!) build sophisticated agents faster while keeping the codebase clean and maintainable.

2. Composio: The Secure Bridge for Tools

Now, how does our VoltAgent-powered agent actually talk to Google Drive? We needed a secure and straightforward way to handle that connection. Wrestling with Google's APIs and authentication flows directly can be time-consuming.

This is where we decided to use Composio. They offer ready-made, secure connections to a ton of applications (hundreds, actually!), including the Google Drive integration we needed.

Instead of us building and maintaining the whole OAuth dance and API logic, Composio provides a secure "bridge" or tool that our agent can use. And the way it provides this tool is through a standard called MCP.

3. MCP (Model Context Protocol): A Common Language for Agents and Tools

Think of MCP as a universal translator. It's a standardized way for AI agents (like those built with VoltAgent) and external tools (like the Google Drive connection from Composio) to communicate securely and effectively.

We made sure VoltAgent understands MCP precisely because it simplifies integrations so much. Composio's tool speaks MCP, VoltAgent speaks MCP, and boom they can talk to each other without us needing to write a bunch of custom glue code.

It's like having standard USB ports for AI tools. Composio conveniently hosts these MCP connection points (called "servers" or "endpoints"), so we didn't even need to worry about running or deploying that part ourselves.

### The TL;DR:

VoltAgent: Our framework for building the agent's logic.
Composio: Provides the pre-built, secure Google Drive tool.
MCP: The standard protocol that lets VoltAgent use Composio's tool easily. ###

Let's Get it Running!

The best part? We've packaged everything into a ready-to-use template, no need to start from scratch. Since we'll be sharing many VoltAgent tutorials, in this post we'll focus on an example MCP integration rather than walking through a full VoltAgent setup.

Step 1: Grab the Example Code

Pop open your terminal and run this:

npm create voltagent-app@latest -- --example with-google-drive-mcp

This command uses our create-voltagent-app tool to fetch the complete project code.

Step 2: Jump Into the Project Directory

cd with-google-drive-mcp

Step 3: Install Dependencies

Our example has two main parts: the server (where the VoltAgent logic runs) and the client (the simple chat interface you'll see in your browser). We need to install the necessary Node.js packages for both.

# Install server dependencies
cd server
npm install

# Go back and install client dependencies
cd ../client
npm install

# Go back to the project root
cd ..

Step 4: Configure Your API Keys

The agent needs credentials to talk to OpenAI (for the AI model) and Composio (for the Google Drive tool).

Navigate into the server directory (cd server if you're in the root).
Create a new file named .env (just .env, no name before the dot).
Paste the following into your new server/.env file:

  # Get your Composio API key from https://app.composio.dev/developers
  COMPOSIO_API_KEY="your_composio_api_key"

  # Get your Google Drive Integration ID from Composio (https://app.composio.dev/app/googledrive)
  # You might need to add the Google Drive app in Composio first.
  GOOGLE_INTEGRATION_ID="your_google_integration_id"

  # Get your OpenAI API key from https://platform.openai.com/api-keys
  OPENAI_API_KEY="your_openai_api_key"

Now, replace the placeholder values:
- Log into Composio. Find your API Key under Developer settings and paste it in.
- In Composio, go to the Apps section, find (or add) Google Drive, and copy its Integration ID. Paste that in.
- Head over to OpenAI to get your API key and paste it in.
- Save the .env file. Remember to keep these keys private! Don't commit them to Git.

Step 5: Start the App

You'll need two separate terminal windows for this.

Terminal 1: Start the Server

  # Make sure you're in the project root `with-google-drive-mcp` first
  cd server
  npm run dev

You should see some output indicating the server is running, usually on http://localhost:3000.

Terminal 2: Start the Client

  # Make sure you're in the project root `with-google-drive-mcp` first
  cd client
  npm run dev

This command usually opens the chat interface automatically in your default web browser (likely at http://localhost:5173).

Step 6: Connect Google Drive & Start Chatting!

The first time you load the web interface, Composio will likely guide you through a quick process to authorize access to your Google Account (securely, using OAuth). Once you've done that, you're ready! You can start asking the chatbot to find files in your Google Drive.

Seeing What's Going On: Observability

When we build agents, figuring out why they did something (or why they failed) is super important. That's observability. We knew this would be critical, so we built observability features right into VoltAgent.

You can easily connect your VoltAgent applications to the VoltAgent Console. It gives you detailed logs and traces, showing exactly what steps the agent took, which tools it called (like the Google Drive search), the data flowing in and out, and any errors that occurred. It makes debugging and just understanding the agent's behavior so much easier.

Conclusion

And that's pretty much it! By combining our VoltAgent framework with Composio's handy MCP-based tool integration, we were able to quickly spin up a useful chatbot that talks securely to Google Drive. We think this example really shows the power of using standardized protocols like MCP and frameworks like VoltAgent to accelerate agent development.

The create-voltagent-app command makes trying out examples like this a breeze. We encourage you to grab the code, play around with it, check out the VoltAgent documentation, and see what kinds of cool agents you can build. Let us know what you create!

Building a RAG Chatbot with VoltAgent

Necati Özmen — Fri, 25 Apr 2025 10:50:31 +0000

Introduction

Chatbots have become incredibly useful, haven't they? From customer support to personal assistants, they handle conversations pretty well. But sometimes, standard chatbots hit a wall – their knowledge is limited to what they were trained on.

What if you want your chatbot to answer questions based on specific documents, recent data, or a private knowledge base?

That's where Retrieval-Augmented Generation (RAG) comes in.

Steps we'll cover:

What RAG (Retrieval-Augmented Generation) is and why it's useful.
How VoltAgent's Retriever system facilitates RAG.
Setting up a VoltAgent project.
Implementing a custom BaseRetriever with a simple knowledge base.
Creating a VoltAgent Agent that uses the retriever directly.
Running and testing the RAG chatbot using the VoltAgent Console.

What is RAG, and Why Use It?

At its core, RAG is a technique that helps Large Language Models (LLMs) like the ones powering chatbots become smarter by giving them access to external information before they generate a response.

Think of it like this:

Retrieval: When you ask a RAG-enabled chatbot a question, it first retrieves relevant snippets of information from a predefined data source (like documents, databases, or websites).
Augmentation: This retrieved information (the "context") is then added to your original question.
Generation: Finally, the LLM receives the combined prompt (your question + the retrieved context) and generates an answer that's grounded in that specific information.

The result? Chatbots that can provide more accurate, up-to-date, and contextually relevant answers, going beyond their built-in knowledge. I find this incredibly powerful for building bots that need to know about specific product documentation, internal company policies, or recent news.

Introducing VoltAgent

Before we dive into building, let me briefly mention VoltAgent. It's a TypeScript framework I've been working with that makes building sophisticated AI agents (like our RAG chatbot) much simpler. It handles a lot of the boilerplate, letting me focus on the core logic of my agents, including how they retrieve and use information.

VoltAgent's Retriever System

VoltAgent provides a streamlined way to implement RAG through its Retriever system. The key component is the BaseRetriever abstract class (you can find it in @voltagent/core).

To add RAG capabilities to your agent, you essentially need to:

Create a Custom Retriever: Extend BaseRetriever and implement the retrieve method. This method contains your logic for fetching relevant data from your chosen source based on the user's input.
Connect it to an Agent: VoltAgent offers two main ways to do this, as detailed in the Retriever documentation:
- Direct Attachment (agent.retriever): The retriever runs automatically before every LLM call for that agent. Simple, ensures context is always fetched.
- As a Tool (agent.tools): The LLM decides when to call the retriever tool based on the conversation. More efficient and flexible, especially if retrieval isn't always needed.

For this tutorial, we'll use the direct attachment method for simplicity. Our agent will always try to fetch context from its small knowledge base before answering.

Let's Build a Simple RAG Chatbot

Okay, theory's great, but let's get hands-on. I'll show you how I built a basic RAG chatbot using VoltAgent that answers questions based on a small, hardcoded set of facts about VoltAgent itself.

Setting Up the Project

The easiest way to start a new VoltAgent project is using the create-voltagent-app CLI tool. For this example, let's name our project with-rag-chatbot. Open your terminal and run:

npm create voltagent-app@latest with-rag-chatbot

This command will guide you through the setup process. (For more details on using the CLI or setting up manually, check the Quick Start guide).

After the setup, navigate into your new project directory:

cd with-rag-chatbot

The CLI sets up a standard project structure for you:

with-rag-chatbot/
├── src/
│   └── index.ts     # Our main agent logic will go here
├── package.json     # Project dependencies
├── tsconfig.json    # TypeScript config
├── .gitignore       # Git ignore rules
└── .env             # API keys (important!) - created automatically or you add it

The generated package.json will be similar to these (though versions might differ slightly):

package.json (Example)

// ... (scripts, name: "with-rag-chatbot", etc.)
  "dependencies": {
    "@ai-sdk/openai": "...", // Or your chosen LLM SDK
    "@voltagent/core": "...",
    "@voltagent/vercel-ai": "...", // Or your chosen provider
    "zod": "..."
  },
// ... (devDependencies)

Now, let's focus on the code inside src/index.ts.

Implementing the Retriever and Agent

This is where the magic happens. In src/index.ts, I defined a simple retriever and an agent that uses it.

//src/index.ts

import { VoltAgent, Agent, BaseRetriever, type BaseMessage } from "@voltagent/core";
import { VercelAIProvider } from "@voltagent/vercel-ai";
import { openai } from "@ai-sdk/openai";

// --- Simple Knowledge Base Retriever ---

class KnowledgeBaseRetriever extends BaseRetriever {
  // Our tiny "knowledge base"
  private documents = [
    {
      id: "doc1",
      content: "What is VoltAgent? VoltAgent is a TypeScript framework for building AI agents.",
    },
    {
      id: "doc2",
      content:
        "What features does VoltAgent support? VoltAgent supports tools, memory, sub-agents, and retrievers for RAG.",
    },
    { id: "doc3", content: "What is RAG? RAG stands for Retrieval-Augmented Generation." },
    {
      id: "doc4",
      content:
        "How can I test my agent? You can test VoltAgent agents using the VoltAgent Console.",
    },
  ];

  // Reverting to simple retrieve logic
  async retrieve(input: string | BaseMessage[]): Promise<string> {
    const query = typeof input === "string" ? input : (input[input.length - 1].content as string);
    const queryLower = query.toLowerCase();
    console.log(`[KnowledgeBaseRetriever] Searching for context related to: "${query}"`);

    // Simple includes check
    const relevantDocs = this.documents.filter((doc) =>
      doc.content.toLowerCase().includes(queryLower)
    );

    if (relevantDocs.length > 0) {
      const contextString = relevantDocs.map((doc) => `- ${doc.content}`).join("\n");
      console.log(`[KnowledgeBaseRetriever] Found context:\n${contextString}`);
      return `Relevant Information Found:\n${contextString}`;
    }

    console.log("[KnowledgeBaseRetriever] No relevant context found.");
    return "No relevant information found in the knowledge base.";
  }
}

// --- Agent Definition ---

// Instantiate the retriever
const knowledgeRetriever = new KnowledgeBaseRetriever();

// Define the agent that uses the retriever directly
const ragAgent = new Agent({
  name: "RAG Chatbot",
  description: "A chatbot that answers questions based on its internal knowledge base.",
  llm: new VercelAIProvider(), // Using Vercel AI SDK Provider
  model: openai("gpt-4o-mini"), // Using OpenAI model via Vercel
  // Attach the retriever directly
  retriever: knowledgeRetriever,
});

// --- VoltAgent Initialization ---

new VoltAgent({
  agents: {
    // Make the agent available under the key 'ragAgent'
    ragAgent,
  },
});

Code Breakdown:

KnowledgeBaseRetriever: Extends BaseRetriever. It holds a small array of documents. The retrieve method performs a simple case-insensitive search. If it finds matches, it formats them into a string prefixed with "Relevant Information Found:"; otherwise, it returns a "not found" message.
ragAgent: An Agent instance.
- We give it a name and description.
- We configure the llm provider and model.
- Crucially, we set retriever: knowledgeRetriever. This tells VoltAgent to automatically run our retriever before calling the LLM.
- The systemPrompt is important here. It explicitly tells the LLM to base its answers only on the "Relevant Information Found" (which our retriever provides) and what to do if no information is found. This helps prevent the LLM from falling back on its general knowledge.
new VoltAgent(...): Initializes the VoltAgent server and registers our ragAgent.

Running the Agent

Before running, you need an API key for your chosen LLM provider (like OpenAI).

Create .env file: In the root of your with-rag-chatbot project folder, create a file named .env.
Add API Key: Add your key like this:
bash title=".env" OPENAI_API_KEY=your_openai_api_key_here
(Replace your_openai_api_key_here with your actual key).
Install Dependencies: Open your terminal inside the with-rag-chatbot directory and run:

   npm install

Start the Agent: Run the development server:

npm run dev

You should see the VoltAgent server startup message, including a link to the Developer Console:

══════════════════════════════════════════════════
  VOLTAGENT SERVER STARTED SUCCESSFULLY
══════════════════════════════════════════════════
  ✓ HTTP Server: http://localhost:3141

  Developer Console:    https://console.voltagent.dev
══════════════════════════════════════════════════

Testing in the Console

Now for the fun part!

Open Console: Go to https://console.voltagent.dev in your browser.
Find Agent: You should see your "RAG Chatbot" listed. Click on it.
Chat: Click the chat icon (usually bottom-right) to open the chat window.
Ask Questions: Try asking questions related to the documents in our retriever:
- What is VoltAgent? (Should use doc1)
- What features does VoltAgent support? (Should use doc2)
- How can I test my agent? (Should use doc4)
- What is the capital of France? (Should state it can't answer based on its knowledge base, because of our system prompt and lack of relevant documents).

Observe the responses. They should be directly based on the content from the documents array we provided! You can also check your terminal where you ran npm run dev - you'll see the logs from the KnowledgeBaseRetriever showing what context (if any) was found for each query.

Conclusion

As you can see, implementing a basic RAG system with VoltAgent is quite straightforward. By creating a custom BaseRetriever and attaching it to an Agent, I could quickly build a chatbot grounded in specific external knowledge.

This simple example uses hardcoded data, but you could easily adapt the KnowledgeBaseRetriever to fetch data from a real database, API, or vector store for much more powerful applications. RAG opens up a lot of possibilities for creating smarter, more knowledgeable AI agents, and I think VoltAgent makes it significantly easier to get started.