Agents that ship: Breakdown of the 3-part architecture that survived real-world chaos

Abhi — Mon, 24 Nov 2025 11:28:58 +0000

1. The Big Picture: From Passive AI to Autonomous Agents

Historical Context

Traditional AI was passive — it responded to prompts, answered questions, or translated text.
The new wave is about autonomous, goal-oriented AI agents — systems that plan, act, and solve complex problems over multiple steps without constant supervision.

The Core Idea

Agents don’t just talk — they act.
They execute actions in the real (or digital) world to achieve defined goals.

(Agentic AI problem-solving process from Whitepaper - Introduction to Agents and Agent architectures)

2. The Agent Anatomy: Three Core Parts

The white paper breaks down an agent into three key components:

The Model (Brain) – The reasoning and decision-making core.
The Tools (Hands) – The interfaces to act on the world.
The Orchestration Layer (Conductor) – The system that coordinates everything.

A. The Model – “The Brain”

The LLM (Language Model) serves as the reasoning engine.
Its main function: managing the context window — constantly deciding what’s important right now from:
- The mission goal
- Memory
- Tool outputs

It determines what matters for the next reasoning step.

B. The Tools – “The Hands”

Tools are how agents interact with the outside world — APIs, functions, databases, vector stores, etc.
Examples:
- Look up customer data
- Check inventory
- Query a vector database

The model decides which tool to use, while the orchestration layer executes the call and feeds results back into the model.

C. The Orchestration Layer – “The Conductor”

Governs the entire reasoning loop:
- Planning
- Memory/state management
- Reasoning strategy (e.g., Chain-of-Thought, ReAct)

The ReAct Loop

Think: Based on the goal, decide next step.
Act: Use a tool.
Observe: Take in the result.
Think again: Iterate.

This think–act–observe loop is what transforms an LLM into a true agent capable of executing complex, adaptive workflows.

3. Example: The Agentic Loop in Action

Scenario: Organizing a Team’s Travel

Mission: “Organize my team’s travel.”
Scan the Scene: Identify tools — calendar, booking APIs, etc.
Plan: “First, get the team roster.”
Act: Call getTeamRoster() tool.
Observe & Iterate:

Receive team list → update context.
Next step: check availability, then book travel.

This cycle continues until the mission is completed.

4. Levels of Agent Capability (Taxonomy)

(Agent Taxonomy Levels from Whitepaper - Introduction to Agents and Agent architectures)

Designing an agent requires defining its capability level:

Level 0: Basic LLM

Just the model.
No tools or external access.
Can explain concepts but cannot access real-time data.

Level 1: Connected Problem Solver

Model + Tools.
Gains real-world awareness.
Example: Looks up current sports scores via a search API.

Level 2: Strategic Problem Solver

Handles multi-step tasks using context engineering.
Example: “Find a coffee shop halfway between two addresses.”
- Uses a map tool → gets midpoint coordinates.
- Then queries coffee shops near that point with ratings >4.0.

Level 3: Collaborative Multi-Agent System

A team of agents working together.
Example:
- Project Manager Agent → delegates to
- Market Research Agent
- Data Analysis Agent
Enables goal delegation and independent sub-agent reasoning.

Level 4: Self-Evolving System

Agents that identify and fill their own capability gaps.
Example:
- Realizes it needs social media sentiment analysis.
- Creates a new agent to perform that task.
- Configures access and integrates it automatically.

5. Building Reliable Production-Grade Agents

Model Selection

Don’t just chase benchmarks.
Choose models that are:
- Strong in reasoning.
- Reliable with tool usage.
Use Model Routing:
- Heavy reasoning → Gemini 1.5 Pro.
- Simple tasks → Gemini 1.5 Flash.
- Balances cost and performance.

Tool Design

Two main categories:

Retrieval Tools (RAG, Vector DBs) – Ground the agent in factual data.
Action Tools (APIs, Scripts) – Allow real-world execution.

Function Calling

Tools must have clear specifications (e.g., OpenAPI format).
The model must know:
- What the tool does.
- What parameters it requires.
- What output to expect.

This ensures the loop stays stable and accurate.

Memory Management

Short-Term Memory: Current context and reasoning trace for the task.
Long-Term Memory: Persistent storage — preferences, user history, learned data. Often implemented via vector databases as RAG tools.

6. Testing and Debugging (AgentOps)

Evaluation

Traditional testing doesn’t work — outputs vary.
Use AI-as-a-judge:
- Another model grades outputs against a rubric.
- Checks factual grounding and adherence to constraints.

Observability

OpenTelemetry Traces track every step:
- Prompts, reasoning, tools used, parameters, outputs.
Acts as a flight recorder for debugging.

User Feedback

Every failure → a new test case.
Builds a “golden dataset” that prevents recurring issues.

7. Security and Governance

The Trust Trade-Off

More capabilities = more risk.
Requires Defense-in-Depth:
- Hard-coded guardrails (policy engines).
- AI-based guard models to detect risky behavior pre-execution.

Agent Identity

Each agent needs a secure digital identity (e.g., SXBF standard).
Enables least-privilege access control — limit what each agent can do.

Agent Governance

Prevent agent sprawl with a central control plane:
- Routes all traffic (user ↔ agent, agent ↔ tool).
- Enforces policies and authentication.
- Monitors logs and performance metrics.

8. Continuous Learning and Adaptation

Agents evolve through:

Runtime logs and traces
User feedback
Policy and data updates

Simulation Environments (“Agent Gym”)

Safe sandbox for testing complex multi-agent behaviors.
Enables experimentation with synthetic data before deployment.

9. Real-World Examples

Google Co-Scientist

A Level 3–4 system for scientific research.
Acts as a virtual collaborator:
- Formulates hypotheses.
- Designs experiments.
- Analyzes data.
Uses multiple agents under a supervisor agent.

(The AI co-scientist design system from Whitepaper - Introduction to Agents and Agent architectures)

AlphaVolve

A Level 4 AI system focused on algorithm discovery.
Generates, tests, and evolves algorithms autonomously.
Has achieved improvements in:
- Data center efficiency.
- Matrix multiplication algorithms.
Humans guide the process by defining evaluation metrics.

(Alpha Evolve design system )

10. The Takeaway: Becoming an AI Architect

Building successful agents isn’t about having the smartest model — it’s about engineering rigor.

The Core Components

Model → Reasoning
Tools → Action
Orchestration → Management

What Matters Most

Architecture
Governance
Security
Testing
Observability

Role as a developer is evolving:

From coder to architect — designing intelligent, autonomous systems that act as collaborative partners, not just tools.

Next I will share how to create an AI Agent from scratch
Until then 👋

[Reference: Whitepaper: Introduction to Agents. Authors: Alan Blount, Antonio Gulli, Shubham Saboo, Michael Zimmermann, and Vladimir Vuskovic]

An open letter to Microsoft

Abhi — Fri, 20 Jan 2023 03:07:52 +0000

Dear Microsoft,

I am writing to express my deep concern about the decision to put accessible features, such as live captions, behind a paywall. This decision has a significant impact on individuals who are hard of hearing or deaf, as it makes it difficult for them to access important information and communicate effectively.

Access to live captions is a basic need for many people with hearing impairments, and it is not something that should be restricted by cost. It is essential for them to be able to understand and participate in conversations and meetings, both in personal and professional settings. Without access to live captions, these individuals may miss important information and be excluded from full participation in their communities.

In addition to being essential for communication, live captions are also a valuable tool for education and personal development. Many deaf and hard of hearing individuals struggle to access educational and professional opportunities due to barriers such as a lack of captioning. By making live captions available to all users, Microsoft can help to level the playing field and provide more opportunities for these individuals to reach their full potential.

Furthermore, I would like to bring to your attention that, accessibility features such as live captions are not only beneficial for people with hearing impairments but also for people who are non-native speakers, people in noisy environments, or people with cognitive or learning disabilities. Therefore, it is not only important for people with hearing impairments but for everyone in general.

I urge Microsoft to consider the needs of individuals with hearing impairments and to make live captions available to all users, regardless of their subscription status. This would make a significant difference in the lives of many people and would demonstrate the company's commitment to accessibility and inclusivity.

I would like to thank you for your time and consideration, and I look forward to hearing about any steps that Microsoft is taking to address this issue.

Sincerely,
Abhishek Gupta

Forem: Abhi