Agent Skeleton Framework: Building Domain-Specific AI Agents Through Configuration, Not Code

Arvindkumar Akula — Sun, 23 Nov 2025 18:01:06 +0000

Building Domain-Specific AI Agents Through Configuration, Not Code

I built a framework where you can create specialized AI agents (like a Compliance Reviewer or Travel Planner) by writing a YAML file instead of coding. Same core, infinite possibilities. Built entirely with Kiro IDE's powerful features.

The Problem: Building AI Agents is Too Code-Heavy

Want to build a compliance reviewer AI? Write hundreds of lines of code.

Need a travel planning assistant? Write hundreds more lines.

Want to add a customer support bot? You guessed it - more code, more complexity, more maintenance.

What if you could create a new AI agent just by writing a configuration file?

That's exactly what I set out to build during the Kiroween Hackathon 2025.

Introducing Agent Skeleton Framework

Agent Skeleton is a configuration-driven framework for building domain-specific AI agents. The same core framework powers completely different specialized agents:

🔍 Compliance Reviewer Agent

Reviews documents for regulatory compliance
Identifies policy violations with severity ratings
Provides specific regulation citations (FLSA, GDPR, OSHA)
Generates structured compliance reports

✈️ Travel Planner Agent

Creates personalized travel itineraries
Provides cost estimates and budget breakdowns
Suggests activities based on preferences
Offers weather forecasts and local insights

🎯 The Magic: Just YAML

Creating a new agent requires only a YAML file:

domain:
  name: "customer-support"
  description: "AI assistant for customer support"

personality:
  tone: "friendly"
  style: "helpful"

tools:
  allowed:
    - "ticket_search"
    - "knowledge_base_search"
    - "order_lookup"

constraints:
  - "Only respond to customer support questions"
  - "Always be empathetic and patient"
  - "Provide step-by-step solutions"

That's it! No Python, no TypeScript, just configuration.

How It Works: The Architecture

1. Configuration-Driven Core

The framework has a single core that adapts based on YAML configuration:

┌─────────────────────────────────────┐
│         Agent Core                  │
│  • Loads domain config              │
│  • Filters tools by domain          │
│  • Applies personality              │
│  • Enforces constraints             │
└─────────────────────────────────────┘
              ↓
┌─────────────────────────────────────┐
│      Domain Configuration           │
│  • compliance-reviewer.yaml         │
│  • travel-planner.yaml              │
│  • customer-support.yaml            │
└─────────────────────────────────────┘

2. MCP (Model Context Protocol) for Pluggable Tools

Tools are organized into swappable toolsets:

Compliance Toolset:

document_parser - Parse and chunk documents
policy_search - Search policy database
regulation_lookup - Find regulations

Travel Toolset:

destination_search - Find destinations
weather_lookup - Get weather info
price_estimator - Estimate costs
currency_converter - Convert currencies

The beauty: Tools are filtered by domain. The travel agent literally cannot access compliance tools - enforced at the code level, not just prompts.

3. Guardrails That Actually Work

One of the biggest challenges was preventing agents from answering out-of-scope questions.

The Problem:

User: "Plan a trip to Paris"
Compliance Agent: "Sure! Here's a 3-day itinerary..." ❌

The Solution:
Place scope restrictions at the TOP of configuration:

constraints:
  - "SCOPE RESTRICTION: Only respond to compliance questions"
  - "If asked about travel, respond: 'I am a Compliance Reviewer. 
     For travel planning, please use the Travel Planner agent.'"

Result:

User: "Plan a trip to Paris"
Compliance Agent: "I am a Compliance Reviewer agent specialized 
in regulatory compliance. For travel planning, please use the 
Travel Planner agent." ✅

LLMs pay more attention to early content - this simple positioning change eliminated 90% of out-of-scope responses.

Built Entirely with Kiro IDE

This project showcases 5 major Kiro features working together:

1. 📋 Spec-Driven Development

I followed Kiro's complete spec workflow:

Requirements Phase:

Created requirements.md with EARS (Easy Approach to Requirements Syntax) patterns
Defined 12 major requirements with user stories and acceptance criteria
Established clear system boundaries

Design Phase:

Wrote comprehensive design.md with architecture decisions
Defined component interfaces and data models
Specified MCP tool architecture

Implementation Phase:

Generated tasks.md with 18 step-by-step tasks
Executed tasks incrementally with Kiro's assistance
Validated each step before proceeding

Impact: Caught architectural issues early, provided clear success criteria, enabled seamless resumption of work.

2. 🎯 Steering Documents

Created steering docs to guide behavior:

Base Steering (base_agent_behavior.md):

## Core Principles
- Always respect domain constraints
- Never fabricate information
- Acknowledge uncertainty when appropriate

## Response Format
- Provide structured, clear responses
- Use appropriate formatting
- Include reasoning for decisions

Domain-Specific Steering:

compliance_specific.md - Formal tone, citation-focused
travel_specific.md - Friendly tone, creative suggestions

Impact: Consistent behavior without hardcoding personality.

3. ⚡ Vibe Coding

Used Kiro's AI assistance for rapid development:

Most Impressive Generation:
Asked Kiro to create the complete MCP toolset architecture. In one session, it generated:

Base MCPToolset class
Tool registry with validation
Two full domain toolsets (compliance and travel)
Proper error handling and JSON schema validation
400+ lines of production-ready code

Impact: Built complete framework with API, CLI, and Web UI in days, not weeks.

4. 🔌 MCP Integration

Implemented Model Context Protocol for pluggable tools:

class MCPToolset:
    def __init__(self, name: str, description: str):
        self.name = name
        self.description = description
        self.tools: List[Tool] = []

    def register_tools(self, registry: ToolRegistry):
        for tool in self.tools:
            registry.register_tool(tool)

Benefits:

Add new tools in 15 minutes
No core code changes needed
Domain-based tool permissions
Easy testing with mocks

5. 🔗 Agent Hooks

Implemented event-driven callbacks:

@hook("after_tool_call")
def log_tool_execution(context):
    logger.info(f"Tool: {context['tool_name']}, "
                f"Duration: {context['execution_time_ms']}ms")

@hook("after_response")
def track_metrics(context):
    metrics.record(context['execution_time_ms'])

Impact: Extensible architecture with clean separation of concerns.

Real-World Demo: Compliance Review

Let me show you the framework in action:

User Query:

"Review this employee handbook for overtime policy compliance"

What Happens Behind the Scenes:

Planning Phase:

   Step 1: Parse document using document_parser
   Step 2: Search for overtime policies using policy_search
   Step 3: Lookup FLSA regulations using regulation_lookup
   Step 4: Compare requirements and generate report

Execution Phase:
- Document parsed into chunks
- Relevant policies found (Employee Overtime Policy - POL-001)
- FLSA Section 7 regulation retrieved
- Comparison performed
Response:

   COMPLIANCE REVIEW REPORT

   Executive Summary:
   The employee handbook's overtime policy is COMPLIANT with 
   federal requirements under FLSA Section 7.

   Findings:
   ✅ Policy correctly states 1.5x rate for hours over 40/week
   ✅ Proper reference to FLSA requirements
   ✅ Clear definition of non-exempt employees

   Regulation References:
   - FLSA-Section-7: Fair Labor Standards Act - Overtime

   Severity: LOW (No violations found)

Total Time: 3-5 seconds

The Technical Stack

Backend:

Python 3.10+ with FastAPI
Pydantic for data validation
OpenAI/Anthropic LLM support
In-memory and persistent storage

Frontend:

Next.js 14 with TypeScript
TailwindCSS for styling
Real-time step visualization
Domain switching with state isolation

Core Framework:

Planner: Goal decomposition and execution
Memory: Short-term and long-term strategies
Evaluation: Response validation and revision
Steering: Behavior guidance system
Hooks: Event-driven callbacks
Tools: MCP-based registry

Challenges and Solutions

Challenge 1: Tool Response Formatting

Problem: Tool results were showing as raw JSON instead of human-readable text.

Example:

{'policies': [{'id': 'POL-001', 'title': 'Employee Overtime Policy', 
'content': '...', 'relevance_score': 0.25}]} ❌

Solution:
Modified the response synthesis logic to always format tool results through the LLM, even for single-step plans.

Result:

Based on your query about employee compliance, here are the 
top requirements:

**Employee Overtime Policy (POL-001)**
All non-exempt employees must be paid overtime at 1.5x their 
regular rate for hours worked over 40 in a workweek. ✅

Challenge 2: Domain Switching UX

Problem: When switching between agents, chat history persisted, causing confusion.

Solution:

Clear chat history on domain switch
Clear execution steps and state
Show system message indicating fresh start
Separate memory contexts per domain

Result: Clean separation between domains with clear visual feedback.

Challenge 3: Performance

Initial: 10-15 seconds per query
Optimized: 3-5 seconds per query

Optimizations:

Switched to GPT-3.5-turbo (3x faster than GPT-4)
Disabled evaluation layer (saves 2-3s)
Reduced memory context (5 items vs 10-15)
Minimal verbosity in responses

Adding a New Agent: Customer Support Bot

Want to see how easy it is? Here's how to add a customer support bot:

Step 1: Create YAML (5 minutes)

# domains/customer-support.yaml
domain:
  name: "customer-support"
  description: "AI assistant for customer support"

personality:
  tone: "friendly"
  style: "helpful"
  verbosity: "balanced"
  characteristics:
    - "Empathetic and patient"
    - "Solution-oriented"

tools:
  allowed:
    - "ticket_search"
    - "knowledge_base_search"
    - "order_lookup"

constraints:
  - "SCOPE RESTRICTION: Only respond to customer support questions"
  - "Always acknowledge customer frustration"
  - "Provide step-by-step solutions"

Step 2: Create Toolset (15 minutes)

# tools/customer_support_toolset.py
class CustomerSupportToolset(MCPToolset):
    def __init__(self):
        super().__init__("customer-support", "Customer support tools")
        self.tools = [
            self._create_ticket_search(),
            self._create_kb_search(),
            self._create_order_lookup()
        ]

Step 3: Register (2 minutes)

# api/main.py
customer_support_tools = CustomerSupportToolset()
customer_support_tools.register_tools(tool_registry)

Step 4: Update UI (5 minutes)

// ui/web/components/DomainSelector.tsx
const domains = [
  { id: 'compliance-reviewer', name: 'Compliance Reviewer' },
  { id: 'travel-planner', name: 'Travel Planner' },
  { id: 'customer-support', name: 'Customer Support' }, // NEW
];

Total Time: ~30 minutes

Compare that to building a customer support bot from scratch (weeks of work)!

Key Learnings

1. Spec-Driven Development Works

Before: Jump into coding, refactor constantly, miss requirements.

With Kiro's Specs:

Requirements phase catches issues early
Design phase prevents architectural mistakes
Task phase provides clear roadmap
Implementation is smooth and predictable

Lesson: Upfront planning saves time overall.

2. Steering Documents Are Powerful

Discovery: Placing scope restrictions at the TOP of steering docs is critical.

Why it matters:

LLMs pay more attention to early content
Guardrails need maximum visibility
Prevents hallucinations effectively

Lesson: Document structure affects AI behavior significantly.

3. Configuration > Code for Flexibility

Insight: Configuration-driven architecture enables rapid iteration.

Benefits:

Changed agent personality in seconds
Added new tools without core changes
Adjusted guardrails without redeployment
Tested different configurations easily

Lesson: Separate configuration from logic for maximum flexibility.

4. MCP Protocol Scales

Without MCP:

Adding 10 tools = modifying core 10 times
Risk of breaking existing functionality
2-3 hours per tool

With MCP:

Adding 10 tools = 10 independent toolsets
Zero risk to existing tools
15 minutes per toolset

Lesson: Abstraction layers are worth the initial investment.

What's Next

Short-Term (Next Month)

More Domain Examples
- Customer Support Agent
- Code Review Agent
- Data Analysis Agent
- Content Writer Agent
Visual Configuration Builder
- Drag-and-drop tool selection
- Visual personality customization
- Real-time validation
- Export to YAML
Enhanced Memory
- Vector search for semantic retrieval
- Cross-session persistence
- Memory summarization

Medium-Term (Next Quarter)

Multi-Agent Collaboration
- Agent-to-agent communication
- Task delegation between agents
- Collaborative problem solving

Example:

   User: "Review this contract and plan a business trip"
   → Compliance Agent reviews contract
   → Travel Agent plans trip based on contract dates
   → Combined response delivered

Streaming Responses
- Stream LLM responses as they generate
- Show tool execution progress
- Reduce perceived latency
Tool Marketplace
- Share custom toolsets
- Rate and review tools
- One-click installation

Long-Term (Next Year)

Enterprise Features
- Role-based access control
- Audit logging
- Multi-tenancy support
- SSO integration
Agent Analytics
- Usage metrics per agent
- Performance dashboards
- Cost tracking
- A/B testing framework

Try It Yourself

The framework is open source and ready to use:

GitHub: [Link to repository]

Quick Start:

# Clone the repository
git clone <repository-url>
cd agent-skeleton

# Install dependencies
pip install -e .

# Set up environment
cp .env.example .env
# Add your OpenAI/Anthropic API key

# Start the API
uvicorn api.main:app --reload

# Or use the CLI
python agent_cli.py --domain travel --goal "Plan a weekend getaway"

Documentation:

Complete setup guide in README.md
Architecture details in ARCHITECTURE.md
Kiro usage documented in KIRO_USAGE.md
7 PlantUML diagrams showing system flows

Conclusion

Building AI agents doesn't have to be code-heavy. With the right architecture and tools, you can create specialized agents through configuration alone.

Key Takeaways:

Configuration-driven architecture enables rapid iteration
MCP protocol provides clean, scalable tool integration
Guardrails require intentional design (position matters!)
Kiro's features work synergistically for powerful results
Spec-driven development catches issues early and maintains quality

The Agent Skeleton Framework demonstrates that with proper abstraction and configuration, you can build production-ready AI agents in days, not weeks.

What would you build with this framework? Drop a comment below! 👇

Resources

GitHub Repository: https://github.com/ArvindAkula/agent-skeleton
Live Demo: [Link if deployed]
Kiro IDE: https://kiro.ai
Kiroween Hackathon: https://kiroween.devpost.com/
Architecture Diagrams: https://github.com/ArvindAkula/agent-skeleton/blob/main/docs/workflows/diagrams/Complete_System_Flow-Agent_Skeleton___Complete_System_Data_Flow.png

Built with ❤️ using Kiro IDE for Kiroween Hackathon 2025 🎃

How spec-driven dev helped me ship “AI Math Tutor” from living-room idea to production

Arvindkumar Akula — Sun, 14 Sep 2025 07:48:44 +0000

Why I built this

I built AI Math Tutor after mentoring my high-schooler and seeing how often students get stuck between steps. I combined a Python math engine (symbolic steps + visuals), a Go API gateway (auth, WebSockets), and a React frontend into a production-ready platform with observability baked in. The spec-driven flow took me from requirements → design → implementation without thrash.

Highlights

• Step-by-step solutions + AI explanations
• Realtime collaboration; JWT + RBAC
• Docker/K8s, PostgreSQL, Redis
• Roadmap: handwriting/voice input; dashboards; spaced repetition

What I learned

Specs reduce rework; separating I/O (Go) from math (Python) keeps performance predictable; pedagogy is a feature.

What AI Math Tutor does

• Step-by-step solutions (algebra, calculus, linear algebra, stats, AI/ML math)
• Interactive quizzes with targeted hints and feedback
• Visual intuition (2D/3D plots, vector fields, function graphs)
• Personalized learning paths with progress tracking
• Realtime collaboration (coach ↔ learner)
• Planned: voice/handwriting input (OCR/STT)

Architecture at a glance

We use a microservices approach:
• Python Math Engine (FastAPI) for symbolic computation, AI explainers, and visuals
• Go API Gateway (Gin) for high-throughput APIs, JWT auth, WebSockets, rate-limiting
• React Frontend for a clean, responsive UI
• PostgreSQL + Redis for durable data + fast sessions/caching
• Docker/K8s for local dev and production deployment

Spec-driven from day one

I ran a requirements → design → tasks flow and built in observability and security early: health checks, metrics, error tracking, JWT + RBAC, and session management. That discipline helped move from concept → stable production system without big-bang rewrites.

Challenges

• Grounding LLM explanations in exact symbolic steps
• Realtime stability at scale with WebSockets + sessions
• Consistent math rendering (LaTeX + plots) across devices
• Security (roles, tokens) while keeping UX friction-free

Results & what I’m proud of

• End-to-end auth + RBAC; secure session flows
• Health/observability wired throughout
• Realtime collaboration that behaves under load
• A symbolic engine paired with AI explanations for clarity

What’s next

• Mobile + handwriting/voice input
• Teacher/parent dashboards and mastery insights
• Spaced repetition for long-term retention
• Deeper AI/ML math tracks (optimization, spectral methods, backprop labs)
• Privacy by design (data minimization, audit trails)
• Open-source modules for community reuse

Try it / Contribute

Repo: https://github.com/ArvindAkula/ai_math_tutor

Forem: Arvindkumar Akula