Forem: CloudX

Stop getting generic output from Copilot. Teach it your patterns.

Daniel Diaz — Thu, 23 Apr 2026 13:11:44 +0000

The Problem

You use Copilot. You ask it to build something, and it does sort of. It follows your prompt, generates working code, and you ship it.

Then you do it again the next day. And the day after.

A month later, your codebase has class names in PascalCase next to camelCase functions, three different error handling styles, two ways to structure the same kind of module, and hooks that work differently from each other for no clear reason.

All of it generated by Copilot. All of it reviewed and accepted by you.

The problem isn't that Copilot generates bad code. It generates generic code. It doesn't know your stack, your team's decisions, or the patterns you settled on six months ago. It works from its training data. And your codebase slowly starts to look like it was written by the internet.

What are Agent Skills?

Agent Skills are Markdown files that tell Copilot what your conventions are, for a specific domain, task, or workflow. Persistent context the agent reads before generating anything in that area.

They live in your repo at:

.github/
└── skills/
    └── your-skill-name/
        └── SKILL.md

That's the entire setup. No config files, no registration, no CLI step. The file being there is enough for VS Code to discover it.

When you mention the skill in a chat prompt like "Use the component-structure skill to create a new button", Copilot reads that SKILL.md first, then generates code that follows your conventions.

This is what separates skills from regular prompts: they're reusable, versioned, and shared through the repo. Your conventions stop living in someone's head or a doc nobody reads; they live next to the code they govern.

With a well-written description, Copilot can automatically load the relevant skill based on your request without explicit mention. Explicitly referencing skills is more common with smaller models that need extra guidance to identify the right context.

VS Code added native skill support and the /create-skill command in version 1.110. In version 1.113, a dedicated Chat Customizations editor was added, providing a centralized UI to manage skills, instructions, and agents from a single place.

A minimal SKILL.md

Here's the minimum a skill needs to actually work:

.github/skills/component-structure/SKILL.md

Frontmatter:

---
name: component-structure
description: Guidelines for creating React components following team conventions: named exports, props interface, no inline JSX logic.
---

Body:

# Component Structure Skill

## When to use
Creating new React components or reviewing component patterns.

## Conventions
- One component per file
- Props interface defined above the component
- Named exports only (no default exports)
- No inline logic in JSX, extract to handlers or custom hooks
- Error boundaries at page level, not inside components

## Never do this
- Default exports
- Logic directly in JSX
- Props without an explicit interface

Pattern example:

interface ButtonProps {
  label: string;
  onClick: () => void;
  disabled?: boolean;
}

export function Button({ label, onClick, disabled = false }: ButtonProps) {
  const handleClick = () => {
    if (!disabled) onClick();
  };

  return (
    <button onClick={handleClick} disabled={disabled}>
      {label}
    </button>
  );
}

The name field must match the directory name exactly. The description is what Copilot reads to decide whether to load the skill, so be specific. Full frontmatter reference: SKILL.md file format.

Four sections. One pattern. A few explicit don'ts.

When Copilot reads this before generating a component, it follows the same structure every time not because it guessed right, but because you told it.

The skill doesn't need to be exhaustive. It needs to be specific. A skill that covers one thing well beats one that tries to cover everything and ends up covering nothing.

Creating a skill from a conversation

One of the more useful additions in VS Code 1.110 is /create-skill. After debugging a problem across several chat turns and landing on the right approach, you type:

/create-skill

The agent extracts the pattern from your conversation and scaffolds a SKILL.md for you. You review, adjust, commit.

This is how skills actually get created in practice. You don't design a skill from scratch. You solve a problem, recognize you'll solve it the same way every time, and capture that. /create-skill removes the friction of doing the capture manually.

The Ecosystem

Skills are one piece of a larger customization system. Here's a quick map:

Instructions (copilot-instructions.md / .github/instructions/): always-on context loaded in every session. Good for project-wide rules.
Skills ← you're here: domain-specific, activated on demand when mentioned in chat.
Hooks: scripts that run at specific points in the agent lifecycle: before a response, after a file write, on session start.
MCP Servers: external tools that give the agent capabilities your codebase doesn't have natively, like database access, browser automation, or external APIs.
Plugins: installable bundles that package all of the above together.

Each of these has its own post coming. Skills are the easiest starting point entirely local to your repo, zero infrastructure, and the effect on the agent is immediate.

What I Learned

The model doesn't know your stack. You have to teach it.

Copilot is trained on public code. Your team's specific decisions aren't in that data. Skills are the mechanism for closing that gap.

One good skill beats ten prompts.

A well-written skill invoked consistently produces better results than re-explaining your conventions in each prompt. The repeatability is the point.

The skill itself is documentation.

Writing a SKILL.md forces you to articulate things that previously existed informally. Once it's committed, the team has a shared reference, not just for Copilot.

The real value shows up at review time.

When everyone uses the same skill, code review stops being about style. You argue about logic instead. That's the conversation worth having.

This is the first post in a series on VS Code agent customizations: Skills, MCP Servers, CLI, Hooks, and Plugins.

Official docs: Agent Skills VS Code

What Backend Engineers Get Wrong About AI Integration

Juan M. Altamirano — Mon, 13 Apr 2026 17:26:14 +0000

So you've been asked to "add AI" to your project. Maybe it's a chatbot, maybe it's a smart search, maybe it's just your PM watching too many YouTube videos about agents. Either way, here you are.

And look, as backend engineers, we're actually well-positioned for this. We know how to design APIs, handle async flows, think about failure modes. The problem is that LLMs don't behave like anything we've integrated before, and our instincts sometimes work against us.

Here, I will try to cover eight common mistakes that I've seen (and made).

1. Treating the LLM like a deterministic function

This is the big one.

We're used to thinking in terms of: same input → same output. Call a database, get a row. Call an API, get a JSON response. Write a unit test, pin the behavior forever.

LLMs don't work like that. Under the hood, they're predicting the next most probable token based on everything that came before, which means there's inherent randomness added into every response. Same prompt, different temperature, different day, and you might get a slightly different answer. Or a wildly different one.

Think of it less like calling a function and more like asking a very smart contractor to do some work for you. They'll usually get it right, but you need to review the output, define what "right" looks like, and have a plan for when they hand you something unexpected.

The practical consequence: don't build flows that assume the model always returns what you expect. Validate the output. Have fallbacks. Don't let a malformed response crash your service.

2. Not handling retries and timeouts

If you call an external REST API and it times out, you probably have retry logic. Maybe exponential backoff. Maybe a circuit breaker.

For some reason, when we start calling LLM APIs, all of that discipline disappears.

LLM calls are slow. We're talking in the order of seconds depending on the model and the response length. And they fail. Rate limits, provider outages, network issues, it all happens.

Set a timeout. Retry with backoff. If you're building something user-facing, stream the response so it doesn't feel like the app is frozen. Treat the LLM provider like what it is: an external dependency that can and will go down.

3. Ignoring token costs in loops

Token costs have a way of sneaking up on you.

A common pattern that seems harmless: you have a list of items, you loop over them, you call the LLM for each one. 50 items? 50 calls. Each one dragging along the same massive system prompt like dead weight. The model isn't doing 50x the thinking. You're just paying 50x for the setup.

It's like printing the entire company handbook before every meeting just to discuss one item.

Some things to think about:

Can you batch items into a single prompt? Sometimes yes, sometimes the quality drops.
Is the system prompt really 2000 tokens? Can it be 400?
Are you sending context the model doesn't need for that specific call?
Could you cache responses for identical or near-identical inputs?

Token costs are easy to ignore in development (small dataset, free tier) and painful to discover in production.

4. Breaking prompt caching without realizing it

This one is especially relevant if you're building agents.

Most providers cache prompts automatically and give you significant discounts when a request matches a previously seen prefix. In an agent loop where you're continuously resending the system prompt, full conversation history and tool calls, that cache is what keeps your bill reasonable.

The catch: think of it like a prefix match. The moment something differs from the previous call, the cache stops there and you pay full price from that point on.

A classic example is injecting the current timestamp into your system prompt. Seems harmless, but since it changes on every call, nothing after it ever gets cached, and you're paying full price every time.

Treat everything before the dynamic part of your prompt as sacred. Same order, same content, same whitespace. Push dynamic context as far down the message history as possible, after the static parts that can be cached.

This leads to decisions that look counterintuitive: sometimes you deliberately send more tokens just to keep the cacheable prefix intact. A 2000-token cached call is cheaper than a 1500-token cache miss. Once you internalize that, it changes how you structure your payloads entirely.

5. Treating prompt engineering as someone else's problem

"The AI team handles the prompts."

If you're building the integration, you need to understand how prompts work. Not at a research level, but enough to know why your endpoint is returning garbage.

The prompt is part of your application logic. It deserves the same attention as your SQL queries or your request validation. A bad prompt will produce bad output no matter how clean your surrounding code is.

At minimum, know the difference between system and user messages, understand what temperature does to your outputs, and learn when to use structured outputs, instead of trying to parse free-form text. Which brings me to...

6. Trusting free-form text when you don’t have to

I've seen code like this in the wild:

response = llm.complete("Analyze this and return: risk level, explanation, recommendation")
lines = response.split("\n")
risk = lines[0].split(":")[1].strip()

Please don't do this.

Modern LLM APIs support structured outputs — you define a schema, the model returns valid JSON that matches it. Every time. No parsing, no IndexError because the model decided to add an extra line.

In Python, pair it with Pydantic. In Typescript, use Zod for validation and parsing. In Go, define a struct and unmarshal directly. Your future self will thank you.

class AnalysisResult(BaseModel):
    risk_level: Literal["HIGH", "MEDIUM", "LOW", "NONE"]
    explanation: str
    recommendation: str

result = llm.complete_structured(prompt, schema=AnalysisResult)
# result.risk_level is always there. Always a string. Always one of the four values.

7. Putting everything in the context window and calling it RAG

RAG (Retrieval Augmented Generation) is genuinely useful. The idea is simple: instead of stuffing all your knowledge into the prompt, you retrieve only the relevant bits and include those.

But a lot of implementations I've seen are just... dumping documents into the prompt and calling it RAG.

Real RAG has a retrieval step. You embed the user's query, find the most semantically similar chunks from your knowledge base, and only send those. If you're sending 50 pages of documentation on every call, you're not doing RAG, you're doing expensive copy-paste.

The retrieval quality matters as much as the generation. Bad retrieval means irrelevant context, and irrelevant context means bad answers (no matter how good your model is).

8. Not thinking about what happens when the model is wrong

LLMs hallucinate. They confidently state things that are false. This isn't a bug that will get patched, it's just how these models work.

So the question isn't "what if the model is wrong?"... it's "what happens in my system when the model is wrong?"

If you're using an LLM to triage support tickets, a wrong answer means someone spends extra time investigating. Annoying, not catastrophic. But if you're using it to generate financial reports or medical summaries, wrong answers have a very different impact.

Design your system with the assumption that the model will sometimes be wrong. Add human review steps where it matters. Log the inputs and outputs so you can audit. Don't pipe raw LLM output into anything irreversible.

Wrapping up

None of this means LLMs aren't useful, they absolutely are. But they're a new kind of dependency with failure modes we're not used to. The good news is that most of the solid engineering practices we already have (validation, retries, observability, defensive design) apply here too. We just need to remember to actually use them.

One last thing: this space moves fast. What's a workaround today might be a built-in feature tomorrow. If you're already the kind of engineer who keeps up with tech, just make sure AI topics are in the mix. It's worth it.

If you've run into other footguns that aren't on this list, drop them in the comments. I'm curious what patterns other backend engineers have hit.

From 'The Bench' to 'Ready to Ship': How AI Redefined My Learning Curve

Joaquin Islas — Tue, 07 Apr 2026 19:46:52 +0000

The Hook: The Bench Moment

We’ve all been there: you’re on the bench, and the pressure is mounting. Two potential assignments land on your desk, but there’s a catch—they are built on tech stacks you’ve only touched on the surface. The traditional anxiety sets in: How fast can I get up to speed without slowing the team down? In the past, that moment felt like staring at a mountain you had to climb alone. But recently, I realized the climb has changed.

The Old Way vs. The New Way

Before AI, the process was linear and often isolating. You’d spend days digging through dense documentation and watching generic tutorials. Now, the dynamic is flipped. It’s no longer about consuming static content; it’s about on-demand, interactive tutoring.

Feature	Traditional Training	AI-Powered Learning	Impact
Learning Curve	Weeks of passive courses.	Just-in-Time learning.	40% reduction in non-productive time.
Problem Solving	Waiting for a mentor.	Instant 24/7 tutoring.	Saves hours of Senior developers' time.
Contextualization	Generic examples.	Adapts to company standards.	Lower initial error rate.
Feedback Quality	Errors found in PR reviews.	Real-time logic suggestions.	Less rework and debt.

Real-World Case: Bridging Java and Go

To illustrate this, I recently had to pivot from my Java background into a Go project. Instead of starting from zero, I used AI as a "paradigm translator."

💡 The Strategy: I asked the AI: "I'm used to Java's Spring Boot; how do I achieve this same pattern idiomatically in Go?"

Here is a concrete example of how I mapped a common Java pattern (Service/Repository) to idiomatic Go in minutes, thanks to the AI's guidance:

// What I knew: Java Implementation (Spring-like)
public class UserService {
    private final UserRepository repo;

    // Standard constructor injection
    public UserService(UserRepository repo) {
        this.repo = repo;
    }

    public User getUser(String id) {
        // Linear stream-like flow
        return repo.findById(id).orElseThrow(() -> new UserNotFoundException(id));
    }
}

// What I learned (Idiomatic Go): Guided by AI
package service

import "errors"

// AI suggested defining an interface here for the dependency, 
// emphasizing Go's implicit interface satisfaction.
type UserRepository interface {
    FindByID(id string) (*User, error)
}

type UserService struct {
    repo UserRepository // AI explained composition over inheritance
}

// AI helped me write the constructor function
func NewUserService(r UserRepository) *UserService {
    return &UserService{repo: r}
}

func (s *UserService) GetUser(id string) (*User, error) {
    user, err := s.repo.FindByID(id)
    if err != nil {
        return nil, err // AI emphasized Go's explicit error handling pattern
    }
    if user == nil {
        return nil, errors.New("user not found")
    }
    return user, nil
}

✅ The Result: I quickly understood the shift from Object-Oriented inheritance to Go's composition and interfaces. This didn't just teach me syntax; it gave me the confidence to show up as a contributor, not a beginner, from week one.

The Metrics: Why This Matters for the Company

The shift isn't just a feeling; the data supports it:

🚀 Productivity Boost: According to GitHub Research, developers using AI complete tasks up to 55% faster.
⚖️ The Leveling Effect: A study by MIT and Stanford suggests that AI helps developers close the skills gap 43% faster, acting as a "great leveler" for the whole team.
📈 Continuous Learning: As highlighted by the Harvard Business Review, AI-powered training can reduce time-to-productivity by 40% compared to traditional, passive methods.

⚠ The Limits: Being Honest

However, let’s be clear: AI is not a silver bullet. It can teach you syntax and explain complex concepts, but it cannot replace human judgment. It doesn't know the specific business context or the long-term architectural risks of our project. AI provides the tools, but you provide the engineering intuition. Experience still matters; AI just lets you acquire it faster.

🛠️ A Practical Framework

If you're looking to replicate this, here is the routine I used:

🌉 Bridge the Gap: Always start by asking the AI to compare the new technology to a stack you already master.
🔀 The "English + AI" Combo: Use AI to simulate technical interviews or draft documentation in English while you learn the tech—it's a double win for professional growth.
💪 Contribution as Practice: Don't just watch videos. Take a small internal spike or task and use your AI assistant to guide your implementation.

🎯 The Closing Thought

In 2026, "being prepared" no longer means knowing everything by heart. It means being agile enough to adapt. AI has changed the definition of a Senior Engineer: it’s less about having all the answers and more about knowing how to leverage tools to bridge the gap between "I don't know this" and "I’m ready to ship."

References & Sources

EdTech Global (2025): AI in Corporate Education: Efficiency and ROI Statistics.
GitHub / Microsoft (2023): The impact of AI on developer productivity.
MIT & Stanford (NBER, 2024): Generative AI at Work and the Leveling Effect.
Harvard Business Review (2024): How Generative AI Is Changing the Way We Learn.

Stop Wasting Time on CVEs That Don't Affect You

Juan M. Altamirano — Mon, 16 Mar 2026 18:47:53 +0000

The Problem

Aren't you tired of pushing new code and then a few days later receiving an alert from Github's Dependabot? Well, I am.
The most annoying part is looking for the CVE, reviewing your code and then detecting that you aren't using the affected part.
Rinse and repeat for every single alert.

The solution?

That's why I built dep_shield — a CLI that I can plug into my common workflow (lint -> dep_shield -> tests -> sonar) and get a straight answer: "this CVE affects you" or "relax, you're fine."

How dep_shield Works

The flow is straightforward:

Parse dependencies — Read requirements.txt or pyproject.toml, extract packages and versions
Check for CVEs — Query the OSV database for known vulnerabilities
Find usage in code — Scan your Python files to see where you import vulnerable packages
AI-powered analysis — Send the CVE description + your import context to an LLM and ask: "Does this actually affect me?"

The Interesting Parts

Parsing Dependencies (Both Formats)

The tool supports requirements.txt and pyproject.toml — it figures out which one you're using.
Why both? Well, requirements.txt is still widely used, but uv is taking over fast.

Querying OSV (the free vulnerability database)

OSV over NVD or Snyk? No API key, no rate limits, no pricing tiers. NVD is slow and wants you to parse CPE identifiers. Snyk needs auth and has usage limits.
OSV just works — POST a package name, get vulnerabilities back. For a CLI tool meant to run locally and fast, that's exactly what I needed.

OSV_API_URL = "https://api.osv.dev/v1/query"

def query_vulnerabilities(package_name: str, version: str | None) -> list[Vulnerability]:
    payload = {
        "package": {
            "name": package_name,
            "ecosystem": "PyPI"
        }
    }
    if version:
        payload["version"] = version

    response = httpx.post(OSV_API_URL, json=payload, timeout=10.0)
    return parse_vulnerabilities(response.json())

{
    "vulns": [
        {
            "id": "GHSA-cpwx-vrp4-4pq7",
            "summary": "Jinja2 vulnerable to sandbox breakout through attr filter selecting format method",
            "details": "An oversight in how the Jinja sandboxed environment interacts with the `|attr` filter allows an attacker that controls the content of a template to execute arbitrary Python code.\n\nTo exploit the vulnerability, an attacker needs to control the content of a template. Whether that is the case depends on the type of application using Jinja. This vulnerability impacts users of applications which execute untrusted templates.\n\nJinja's sandbox does catch calls to `str.format` and ensures they don't escape the sandbox. However, it's possible to use the `|attr` filter to get a reference to a string's plain format method, bypassing the sandbox. After the fix, the `|attr` filter no longer bypasses the environment's attribute lookup.",
            "aliases": ["CVE-2025-27516"],
            "modified": "2026-02-04T04:14:58.595738Z",
            "published": "2025-03-05T20:40:14Z",
            "affected": [
                {
                    "package": { "name": "jinja2", "ecosystem": "PyPI", "purl": "pkg:pypi/jinja2" },
                    "ranges": [{ "type": "ECOSYSTEM", "events": [{ "introduced": "0" }, { "fixed": "3.1.6"}] }]
                }
            ]
        }
    ]
}

The full response has more fields (references, versions, etc.), but these are the relevant ones.

Finding Where You Actually Use the Package

It's easy to forget where you actually use a dependency — especially in large projects with dozens of packages. And not all usage is equal: importing requests in your core API is very different from importing it in a one-off migration script.

def scan_file_for_package(file_path: Path, package_name: str) -> list[CodeUsage]:
    pattern_import = rf"^import\s+{package_name}(\s|,|$|\.)"
    pattern_from = rf"^from\s+{package_name}(\s|\.)"
    result = []

    with open(file_path, 'r', encoding='utf-8', errors='ignore') as file:
        for line_num, line in enumerate(file, 1):
            line = line.strip()
            if line.startswith("#"):
                continue

            if re.match(pattern_import, line):
                import_type = "import"
            elif re.match(pattern_from, line):
                import_type = "from"
            else:
                continue

            result.append(CodeUsage(
                file_path=str(file_path),
                line_number=line_num,
                line_content=line,
                import_type=import_type
            ))
    return result

The AI Part: Asking A Model If It Matters

Now, here's where it gets interesting. I send the LLM:

The CVE description
The import statements from your code

And ask: "Given how this code uses the package, does this vulnerability apply?"

Important: Only import lines are sent to the LLM — not your actual business logic. The model sees from requests import Session, not the body of your functions. Your code stays local.

_SYSTEM_PROMPT = """You are a security analyst reviewing Python dependency vulnerabilities.
You will be given a CVE description and import-level code evidence.
Your job is to determine if the vulnerability realistically affects this codebase.

Risk level definitions:
- HIGH: The imports directly expose the vulnerable code path
- MEDIUM: Package is imported but no clear evidence the vulnerable feature is used
- LOW: Package only imported in tests or dev tooling
- NONE: CVE conditions are not plausible in this codebase"""

I use Pydantic to force the model into a consistent format, no free-form text that I'd have to parse.
It either gives me a valid ImpactAnalysis or it fails. No "well, it depends..." answers.

class ImpactAnalysis(BaseModel):
    risk_level: Literal["HIGH", "MEDIUM", "LOW", "NONE"]
    explanation: str
    recommendation: str

Making It Smarter Over Time (RAG)

Here's the thing — the LLM doesn't know your codebase. It analyzes each CVE in isolation. But what if it could remember past analyses?

That's where ChromaDB comes in. Every time the tool analyzes a vulnerability, it stores the result. Next time a similar CVE shows up (same package, similar vulnerability type), it retrieves past analyses as context.

def store_analysis(vulnerability_id: str, analysis: ImpactAnalysis, code_context: str):
    collection.add(
        documents=[f"{vulnerability_id}: {analysis.explanation}"],
        metadatas=[{"risk": analysis.risk_level, "package": package_name}],
        ids=[vulnerability_id]
    )

def get_similar_analyses(vulnerability_description: str, limit: int = 3):
    results = collection.query(
        query_texts=[vulnerability_description],
        n_results=limit
    )
    return results

The result? The tool gets better the more you use it. If you analyzed a requests CVE last month and a new one appears, the model sees how you handled it before.

What the Output Looks Like

What I Learned

Honestly? It reinforced an ancient rule: context is everything
A CVE that sounds critical in the abstract becomes irrelevant when you realize you only import that package in a test helper. OSV turned out to be the perfect data source — free, fast, no API keys, solid Python coverage. And while the LLM does a surprisingly good job at triage, I still treat its output as a suggestion, not a verdict. The RAG layer (ChromaDB) was an afterthought, but it's become one of the most useful parts: the tool genuinely gets better as it sees more of your codebase.

What's Next

I have a few ideas in mind:

More languages — JavaScript and Go are probably the next ones
CI/CD integration — run it in your pipeline, make it fail the build if there is any 'HIGH' impact
Offline mode — run it against a local LLM

Try It Yourself

Give it a shot and let me know what breaks. Seriously — I'd appreciate any feedback.
dep_shield

Who tests the tests?

Lucas Gabriel Sánchez — Fri, 20 Feb 2026 17:05:58 +0000

This post is based on a Gophercon talk by Daniela Petruzalek: Who tests the tests?

A little bit of history

In the beginning, we checked our code manually, running the application and trying different inputs: we called that manual testing. Then we discovered that we could write code to test application code to check if it's correct: we called that automatic testing (unit, integration, functional, etc.)

Now we are in the AI era where we write fewer tests and even less code, we need a way to swiftly check that the tests are correct and that they are testing the cases we expect in our application.

How can we know if code written by AI is correct?

You can use automatic tests in the same way we used to check code written by humans.

One option would be to let the AI write the application code and you write the tests, you can even use TDD where you, the human, write the tests and let the AI write the implementation after.

Another option is to let the AI write the application code and the tests, but then how can you be sure that those tests are testing the things you want or need? Reading and understanding the tests would be the best, but can we do that automatically?

Enter: mutation tests

Mutation testing is a way to check that the tests you have are testing the code the way you want by making small changes to the application code and checking that the tests fail as expected.

It's based on a concept known as a mutant: a version of your application with a small change.

How does it work?

The mutation testing cycle is this:

Create a mutant: change the application code by applying just one change
Run your test suite
Check how many mutants were killed

After running your tests, the ones that failed are said to have killed the mutant, those tests are testing something related to the code you changed and are, from the perspective of that change, good tests.

After many changes, if a test never failed it means that test didn't kill any mutant and is a weak test. You should probably delete that test or write a better one.

If a mutant is never killed, then that code is not being tested (no coverage) or is being tested poorly, you have an opportunity to write a test to check that piece of code if needed.

Do I have to make these changes manually?

You can do it manually, but there are some tools to aid you:

C#, TypeScript and Scala: Stryker
Go: go-gremlins
Java: pitest
Python: mutatest
Rust: mutants.rs

These tools provide a way to make those changes automatically and some of them run the tests for you.

Why do we need mutation testing?

Let's see a very simple example in Python:

def divide(a, b):
    if b == 0:
        raise ValueError("can't divide by 0")
    return a/b

A simple test suite we can have is:

import unittest
from divide import divide

class TestDivide(unittest.TestCase):
    def test_divide_error(self):
        with self.assertRaises(ValueError):
            divide(1, 0)

    def test_divide_success(self):
        self.assertEqual(1, divide(1, 1))

if __name__ == '__main__':
    unittest.main()

Run the tests and everything is fine:

$ python tests.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK

Here we are testing both flows, when an error is raised and when we complete a successful operation, coverage is at 100% but those tests are not great, and here's why: let's change the implementation of divide from a/b to a*b:

def divide(a, b):
    if b == 0:
        raise ValueError("can't divide by 0")
    return a*b

Running the tests should fail, right? They don't, because 1*1 is still 1:

$ python tests.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK

This is a problematic test:

def test_divide_success(self):
    self.assertEqual(1, divide(1, 1))

Even when you have 100% test coverage, that doesn't mean you have 100% test case coverage; some cases may be missing or not being tested correctly.

What we did here was a mutation: from a/b to a*b, that mutation gives us information about our tests that we didn't have before.

How to read the results of mutation testing?

When you run your test suite against mutated code, each test can do one of two things: kill the mutant (the test fails, so it detected the change) or survive (the test still passes, so it missed the change). After many mutation, you can summarize how often each test killed mutants, for example:

After 200 Mutations:
TestA: 140 Kills, 60 Survived
TestB: 200 Kills
TestC: 30 Kills, 170 Survived

How to interpret this:

TestA: is good test because on most mutations it was killed.
TestB: is your strongest test, it detects every mutation you ran.
TestC: is the weak one, it usually doesn't detect mutations.

When not to use mutation testing?

In some projects or languages, running mutation tests can take hours. For each mutant, the tool has to compile (if needed) and run the full test suite. If your test suite takes 10 minutes, each mutation will take at least that long, so keep this in mind.

Use this approach on small projects or ones where the "edit, compile, test" cycle is small.

Closing

So who tests the tests? In practice, mutation testing does that: by checking that your tests react when the code is deliberately broken.

Bicycles Are All Your AI Agents Need

Federico Pascarella — Thu, 13 Nov 2025 12:47:16 +0000

From Condors to Code

Somewhere between a condor and a keyboard lies human genius. Steve Jobs once told a story about how humans are terrible movers compared to animals. The condor beats us easily in the race of energy efficiency, but put a person on a bicycle and they fly.
The bicycle, Jobs said, is "a tool that amplifies our efficiency." Computers, he added, are bicycles for the mind.
That thought never left me. And now, with AI agents evolving super-fast, I can't help seeing the same pattern repeat.
My humble view is that tools are still the key. Only this time, the cyclists are our AI agents with its brain (the LLM), and the bicycles are the functions we build for them.
With the right tools, an agent moves with purpose. With clumsy tools, it stalls.

The Engineering of Great Tools

Great agents sit on top of small, sharp Python functions. They are plain, predictable, and fast.

1. Single Responsibility

Specialize each function. Do one job well, then compose.

# Bad: Swiss-army function
def create_user(name, email, send_welcome=True, log=True):
    user = db.save(name, email)
    if send_welcome:
        send_email(email, "Welcome!")
    if log:
        logger.info(f"User {name} created")
    return user

# Good: Focused, composable tools
def save_user(name: str, email: str) -> dict:
    return db.save(name, email)

def send_welcome_email(email: str) -> bool:
    return send_email(email, "Welcome!")

2. Clear interfaces

Name things so intent is obvious. Keep arguments explicit. Return data instead of printing.

# Bad: Vague names and side effects
def discCalc(p, x, t=None):
    result = p - (p * x / 100)
    print(f"Discount applied: ${result}")

# Good: Straight names and returns
def calculate_discount(price: float, percentage: float) -> float:
    return price - (price * percentage / 100)

3. Structured outputs

Agents prefer structure. Return dicts or JSON, not prose.

# Bad: Unstructured string
def get_weather(city):
    temp = fetch_temperature(city)
    return f"It's about {temp} degrees in {city}, partly cloudy"

# Good: MCP tool with schema
from pydantic import BaseModel, Field
from mcp.server import Server
from mcp.types import Tool

server = Server("weather")

class WeatherData(BaseModel):
    city: str = Field(description="City name")
    temperature: float = Field(description="Temperature in Celsius")
    condition: str = Field(description="Weather condition")
    humidity: int = Field(description="Humidity percentage")

@server.call_tool()
async def get_weather(city: str) -> WeatherData:
    temp = await fetch_temperature(city)
    condition = await fetch_condition(city)
    return WeatherData(city=city, temperature=temp, condition=condition, humidity=65)

4. Efficiency

Use built-ins, cache where it helps, and profile before optimizing.

# Bad: Manual loops
def filter_active_users(users):
    result = []
    for user in users:
        if user.get("active"):
            result.append(user)
    return result

# Good: Built-ins plus caching
from functools import lru_cache
from typing import Tuple, List

@lru_cache(maxsize=128)
def filter_active_users(users_tuple: Tuple[dict, ...]) -> List[dict]:
    return [u for u in users_tuple if u.get("active")]

5. Robustness

Validate inputs and fail loudly with helpful errors.

# Bad: No validation
def read_file(path):
    with open(path) as f:
        return f.read()

# Good: Validation and clear errors
def read_file(path: str) -> str:
    if not isinstance(path, str) or not path:
        raise ValueError("Path must be a non-empty string")
    try:
        with open(path, "r") as f:
            return f.read()
    except FileNotFoundError:
        raise FileNotFoundError(f"File not found: {path}")

6. The micro-tooling mindset

Break big jobs into small tools you can test and swap. MCP benefits from chains of simple, named steps.

# Bad: Monolith
def process_user_data(user_id):
    user = db.fetch(user_id)
    validated = validate(user)
    enriched = api.enrich(validated)
    return transform(enriched)

# Good: Composable steps
def fetch_user(user_id: str) -> dict:
    return db.fetch(user_id)

def validate_user(user: dict) -> dict:
    return validate(user)

def enrich_user(user: dict) -> dict:
    return api.enrich(user)

7. Trade-offs

Hundreds of tiny tools can create orchestration overhead. Clear names, steady input and output shapes, and basic docs keep things manageable.

Show, Don't Tell: Two Decision Flows

A concrete example makes the difference clear. Here is the same task, done with weak tools and with strong tools.

Task: Extract newly signed customers from a CSV in cloud storage, enrich each with firmographic data, and email an account summary.

Agent with poorly designed tools

Calls a generic process_file() that auto-detects type and tries to parse everything.
Uses one do everything enrich_user() that accepts many flags, then times out on third party rate limits.
Prints logs to stdout, returns a mixed string summary, and the agent fails to decide what to send.

Decision flow with weak tools

Input: blob path
Branch: auto-detect format, guess schema
Loop: enrich with side effects
Output: unstructured string
Failure mode: retries loop, hallucinates missing fields, no clear errors

Agent with well designed tools

load_csv(path, schema) returns a typed dataframe.
batch_enrich(users, provider, rate_limit) yields structured rows with retry metadata.
render_account_summary(users) returns JSON for send_email(to, subject, body_html).

Decision flow with strong tools

Input: explicit path and schema
Transform: strict parser
Enrich: idempotent, rate limited, returns status per row
Render: deterministic template
Output: email send result with IDs

Result: same goal, three clean steps, easy to test and to explain.

Conclusion

I believe that innovation often hides in simplicity. Building efficient AI agents isn't about giving them infinite intelligence; it's about giving them great tools. Write them clean, focused, and well-documented; think in micro-tooling: small parts, big impact.
So, next time you're debugging that stubborn Python function, just remember: you're not fixing a bug. You're tuning a bicycle for the mind of an AI.

WhatsApp + MCP: automatic audio transcription

German Burgardt — Mon, 29 Sep 2025 19:39:53 +0000

Introduction

MCP (Model Context Protocol) can look complicated until you ship something real with it. Let's use it on something practical: expose your WhatsApp voice notes with your own MCP server and turn them into transcripts.

What is MCP?

MCP is a connection standard that connects AI agents with external systems.

It has a server and a client, and they have two different ways to talk to each other:

stdio (stdin/stdout): the standard Unix mechanism for a process to receive or send data to the environment or another process.
Server-Sent Events (SSE): an HTTP mechanism where the server keeps the connection open and streams events to the client (one-way).

Quick comparison of stdio and SSE transports in MCP.

MCP architecture

Host: Claude Desktop / Cursor / any AI agent. It coordinates the LLM, spins up MCP clients, and shows results.
MCP Client: an implementation embedded in the host that connects to your server. It speaks the protocol, opens/manages the connection, and sends/receives requests.
MCP Server: your program that exposes tools. It runs actions and returns data/events to the client.

An MCP server can expose different capabilities, but in this project we stick to tools (actions like transcribing audio). MCP also supports resources or prompts; we skip them here to keep the flow simple.

Diagram of the Host → MCP Client → MCP Server flow.

Building the WhatsApp MCP

WhatsApp Desktop on macOS stores everything locally: an SQLite database with chats and folders containing the media files.

Our MCP server will:

Read the WhatsApp database
Find audio files per contact
Transcribe them with Whisper
Send the text back to the Client (Cursor in this case)

The working code lives in the repository: mcp-whatsapp-whisper. Let's walk through the key pieces.

The STDIN/STDOUT connection

import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const transport = new StdioServerTransport();
await this.server.connect(transport);

With that the server listens to every client request on STDIN and replies through STDOUT.

We pick stdio because this MCP server runs locally. It's the simplest and most stable transport on desktop/CLI: no open ports, no HTTP dependency, avoids CORS/firewalls, and hosts (Claude Desktop/Cursor) support it natively. SSE makes sense when the server lives remotely behind HTTP.

Exposing capabilities

this.server = new Server(
  {
    name: "whatsapp-audio-mcp",
    version: "1.0.0",
  },
  {
    capabilities: {
      tools: {}, // We will expose actions
    },
  }
);

Designing the tools

The server lives on three tools each with a specific role:

getRecentAudio(contactName, count?): pulls the latest audio paths for a contact.
searchAudios(query, date?): narrows the list by name or date when the history is large. We get filtering without touching SQLite directly.
transcribeAudio(audioPath): turns a path into text with Whisper. It finishes the loop by delivering the result we care about.

The goal was a minimal set: find, refine, transcribe. Each tool lines up with one of those stages.

{
  name: 'transcribeAudio',
  description: 'Transcribe an audio file using OpenAI Whisper (SDK)',
  inputSchema: {
    type: 'object',
    properties: {
      audioPath: {
        type: 'string',
        description: 'Path to the audio file',
      },
    },
    required: ['audioPath'],
  },
}

The schema follows JSON Schema. With it, Cursor knows which parameters to send.

Accessing WhatsApp

WhatsApp Desktop keeps everything under predictable paths:

this.dbPath = path.join(
  homeDir,
  "Library/Group Containers/group.net.whatsapp.WhatsApp.shared/ChatStorage.sqlite"
);
this.mediaPath = path.join(
  homeDir,
  "Library/Group Containers/group.net.whatsapp.WhatsApp.shared/Message/Media"
);

The database is SQLite:

const query = `
  SELECT DISTINCT 
    ZCONTACTJID as jid,
    ZPARTNERNAME as name,
    ZLASTMESSAGEDATE as lastMessageDate
  FROM ZWACHATSESSION
  WHERE ZPARTNERNAME IS NOT NULL
  AND ZCONTACTJID NOT LIKE '%@g.us'  -- Exclude groups
`;

Audio files are organized per contact. We scan recursively:

const audioExtensions = [".opus", ".m4a", ".mp3", ".aac", ".wav"];

async function scanDirectory(dir: string): Promise<void> {
  const entries = await fs.readdir(dir, { withFileTypes: true });

  for (const entry of entries) {
    if (audioExtensions.some((ext) => entry.name.endsWith(ext))) {
      // Found an audio file
      audioFiles.push({
        path: fullPath,
        filename: entry.name,
        modifiedDate: stats.mtime.toISOString(),
      });
    }
  }
}

The transcription: FFmpeg + Whisper

WhatsApp ships audio in Opus, but OpenAI Whisper prefers MP3. We use FFmpeg:

const ffmpeg = spawn("ffmpeg", [
  "-i",
  inputPath, // WhatsApp Opus audio
  "-acodec",
  "mp3",
  "-b:a",
  "128k",
  outputPath, // Temporary MP3
]);

Then we transcribe with OpenAI Whisper (SDK):

import OpenAI from "openai";
import fs from "node:fs";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const transcription = await openai.audio.transcriptions.create({
  file: fs.createReadStream(outputPath), // Temporary MP3
  model: "whisper-1",
});

const transcriptionText = transcription.text;

Configuring Cursor (the client)

In the Cursor config (~/.cursor/mcp.json) we add:

{
  "mcpServers": {
    "whatsapp": {
      "command": "node",
      "args": ["/path/to/mcp-whatsapp-whisper/dist/server.js"],
      "env": {
        "OPENAI_API_KEY": "YOUR_OPENAI_KEY"
      }
    }
  }
}

Cursor can now invoke our server whenever it needs to.

MCP in action

The user asks Cursor:

"Send me the transcript of Elian's last audio."

Cursor automatically:

Calls getRecentAudio(contactName: "elian")
Receives the audio file path
Calls transcribeAudio(audioPath: "/path/to/audio.opus")
Receives the transcription
Summarizes or shows the full text

The transcription flows through the OpenAI API; the temporary MP3 is sent to get the text back. Cursor orchestrates; your server prepares the file and makes the call.

Cursor showing the transcription returned by the WhatsApp MCP server.

Limitations: macOS only

This server is macOS only. The WhatsApp paths are specific to Mac.

It depends on:

WhatsApp Desktop installed
FFmpeg (brew install ffmpeg)
OpenAI SDK (npm i openai) with OPENAI_API_KEY configured
Internet connection

We also skip Prompts and Resource Templates.

Security depends on the host. Cursor can ask for approval before it runs tools.

Keep it running with PM2

Build the project once (npm run build) and keep the server alive with pm2 start ecosystem.config.cjs. The provided config watches the compiled dist/server.js and restarts it if it crashes.

Conclusion

Your AI agent can now reach your data, use your tools, work in your context.

The WhatsApp server is just one idea. Once you realize any program that speaks STDIN/STDOUT can be an MCP server, the possibilities get wild.

Next time you think "I wish Cursor could access...", remember: it probably can. You just need to build the bridge.

How AI Reflects Your Thinking

German Burgardt — Tue, 26 Aug 2025 13:36:38 +0000

When we code using AI we ask ourselves: "what's the best prompt?" or "what magic prompt should I use?".

We'd be better off asking: "what kind of interaction is this?". Trying to understand the nature of the interaction between us and the model.

Maybe the problem isn't the technology, but us.

An Analogy

Imagine you hire a remote programmer. Brilliant, but with some quirks:

Never worked on your project before (0 context)
Extremely literal. If you don't explicitly tell them, they never assume anything.
Doesn't infer context
Completely loses their memory every day, returning to their initial state

How would you communicate with them?

You'd probably:

Explain all the necessary context, very detailed
Be very specific with requirements
Not assume they'll "figure out" anything. You explain everything
Expect some iterations before the final result
Maybe save context files to resend them every day

That's the best way to interact with an AI model.

AI As a Mirror

The model isn't just a task executor. It's also a mirror of your clarity when communicating a problem.

If you give it vague instructions, you get vague results simply because it faithfully reflects how vague your thinking was.

Most of the time when the model "doesn't understand" the problem isn't the model. It's that we ourselves weren't clear about what we wanted.

Clarity As a Skill

The real skill isn't "writing good prompts". It's thinking clearly about problems and communicating that clarity. This is a fundamental skill for any programmer.

Example

What we usually do:

Optimize this function

Why it fails: Optimize in what sense? Speed? Memory? Readability? There's no success criteria.

What we should do:

The processOrders() function in orders.js takes 5 seconds with 1000 orders.
I need it to take less than 1 second.
Orders come from the database already sorted by date.
You can assume there are no duplicate orders.
Logs: <<detailed logs>>

This is much clearer and less abstract. It describes:

The problem (5 seconds is too much)
The measurable goal (less than 1 second)
Constraints (already sorted)
Assumptions (no duplicates)

Breaking Down Problems

One of the skills that improves working with AI is breaking problems down into smaller pieces. AI won't save you the work of thinking. The clarification process itself is valuable work in programming.

Instead of:

Implement a complete authentication system

You learn to think:

Step 1: Define the User model with minimum required fields: <fields>
Step 2: Create the registration endpoint with basic validation (validation type, etc)
[etc...]

The Limitations

AI can only handle 3-4 files well at a time. It's a limitation but with its bright side:

It forces you to keep responsibilities separated and create clear interfaces. You need to avoid coupling and think in small modules.

It incentivizes you to follow good architecture practices.

The Importance of Context

AI needs all the context possible, don't skimp.

CONTEXT: Users report the checkout page hangs
SYMPTOM: The "Pay" button stays in "Processing..." state indefinitely
FILE: checkout.js, handlePayment() function
SUSPICION: Probably missing a catch to handle API errors
TASK: Add robust error handling and visual feedback to the user

The Value of Programming with AI

Programming with AI trains you in thinking clearly and communicating precisely. It forces you to break problems into manageable pieces and be explicit with your requirements while constantly verifying results.

These seem like fundamental skills for any dev regardless of language.

Final Reflection

AI doesn't save you from thinking, or at least you shouldn't use it that way. It's the opposite, every prompt you write is an opportunity to clarify your understanding. Every response you receive is feedback on your clarity. Every iteration is a chance to improve.

Next time you use AI and don't get the expected result, before blaming the model, ask yourself:

Did I really have clarity on what I wanted?
Did I break down the problem into manageable parts?

These models are honest, literal collaborators. They give you exactly what you ask for, but they demand clarity. Learning to be clear is learning to think well. AI used properly makes you a better programmer.

Automate Any Repetitive Task with MCP

German Burgardt — Mon, 28 Jul 2025 17:30:01 +0000

The Problem: Repetitive Detailed Prompting

Every time I start a new task in Claude Code / Cursor, I type a detailed prompt to guide the AI through an internal monologue before proceeding. For example:

"You will generate an internal monologue of 200 numbered lines where two thinkers debate the approach:

Pragmatic focuses on functionality and efficiency
Creative on innovation and elegance
Follow these rules: exactly 200 lines, each starting with [Pragmatic] or [Creative]
Be specific about code without abstractions
Reflect and question without solutions
Mention files/functions/variables
Consider edge cases/performance/maintainability/user experience
Debate simplicity vs functionality
Question decisions, no repeats, end without conclusion
Then address the task: [actual task here]."

Typing this repeatedly 20+ times a day wastes time and disrupts focus.

As someone researching practical AI applications, we can fix that.

// Before: 200+ word prompt every time
// After: "internal monologue 200 lines - implement auth system"

Enter MCPs: The Missing Link

Model Context Protocols (MCPs) allow extending AI agents with custom tools. While common examples include fetching data, web browsing, or integrating with Slack, I used it in a novel way to automate my repetitive prompt.

From Repetition to Automation

I built an MCP server in my Remix app (essentially the same as plain Node.js) that generates these monologues on demand. Now, Claude detects the trigger and handles it automatically.

Here's a glimpse of what it generates:

1. [Pragmatic] We need to implement auth - start with basic JWT in middleware.js
2. [Creative] But what about OAuth? Users expect social login nowadays...
3. [Pragmatic] OAuth adds complexity - first nail down password flow, then extend
...

The difference:

Before: Type the full detailed prompt each time, then describe the task.
After: Simply say "internal monologue 200 lines about X - [task]", and Claude generates the monologue via the tool, then proceeds.

Time saved: ~2 minutes per task

Characters typed: 300+ → 40

Building Your Own Monologue MCP

Here's how to implement it in a Node.js server (adaptable from my Remix example).

Step 1: Install Dependencies

npm install @modelcontextprotocol/sdk zod @anthropic-ai/sdk

Step 2: Create the MCP Server Handler

Create app/lib/mcp-server.ts:

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
import { z } from "zod";
import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

export function createMCPServer() {
  const server = new Server(
    {
      name: "monologue-mcp",
      version: "1.0.0",
    },
    {
      capabilities: {
        tools: {},
      },
    }
  );

  // Define the monologue tool
  server.setRequestHandler("tools/list", async () => ({
    tools: [
      {
        name: "generate-monologue",
        description:
          "Generate a reflective internal monologue in the style of Pragmatic vs Creative thinker",
        inputSchema: {
          type: "object",
          properties: {
            lines: {
              type: "number",
              description: "Number of lines in the monologue (default: 100)",
              default: 100,
            },
            context: {
              type: "string",
              description: "Current conversation context",
            },
            task: {
              type: "string",
              description: "Description of the task to perform",
            },
          },
          required: ["task"],
        },
      },
    ],
  }));

  // The actual tool implementation
  server.setRequestHandler("tools/call", async (request) => {
    if (request.params.name === "generate-monologue") {
      const ArgsSchema = z.object({
        lines: z.number().int().min(1).max(500).default(100),
        context: z.string().max(2000).optional().default(""),
        task: z.string().min(1).max(1000),
      });

      const { lines, context, task } = ArgsSchema.parse(
        request.params.arguments
      );

      try {
        const systemPrompt = `You are two thinkers having an internal dialogue about programming.
Pragmatic is focused on functionality and efficiency.
Creative is obsessive about innovation and elegance.

STRICT RULES:
1. Generate EXACTLY ${lines} numbered lines
2. Each line must start with [Pragmatic] or [Creative]
3. NO abstractions - be specific about the code
4. NO complete solutions - REFLECT and QUESTION
5. Mention specific files, functions, variables when relevant
6. Think about: edge cases, performance, maintainability, user experience
7. Debate simplicity vs functionality
8. Question every technical decision
9. NO repeated ideas - each line must add new value
10. End without a definitive conclusion - it's reflection, not decision`;

        const userPrompt = `${
          context ? `Previous context:\n${context}\n\n` : ""
        }Current task: ${task}

Generate an internal monologue of EXACTLY ${lines} numbered lines where the two thinkers debate the best way to approach this task.`;

        const response = await anthropic.messages.create({
          model: "claude-opus-4-20250514",
          max_tokens: 32000,
          temperature: 1,
          system: systemPrompt,
          messages: [
            {
              role: "user",
              content: userPrompt,
            },
          ],
        });

        const monologue = response.content[0].text;

        return {
          content: [
            {
              type: "text",
              text: monologue,
            },
          ],
        };
      } catch (error: any) {
        return {
          content: [
            {
              type: "text",
              text: `Error generating monologue: ${error.message}`,
            },
          ],
          isError: true,
        };
      }
    }

    throw new Error(`Unknown tool: ${request.params.name}`);
  });

  return server;
}

Step 3: Create the API Route

Create app/routes/api.mcp.ts:

The MCP server needs to be exposed as an HTTP endpoint. We use Bearer authentication to secure it. Only Claude (or other authorized clients) with the correct API key can access your server. This prevents random people from using your tools.

import type { LoaderFunctionArgs } from "@remix-run/node";
import { createMCPServer } from "~/lib/mcp-server";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";

// SSE (Server Sent Events) keeps an open connection between Claude and your server
// This allows Claude to call your tools in real time without polling

// Simple auth check
function verifyAuth(request: Request): boolean {
  const authHeader = request.headers.get("Authorization");
  const expectedKey = process.env.MCP_API_KEY || "your-secret-key";
  return authHeader === `Bearer ${expectedKey}`;
}

export async function loader({ request }: LoaderFunctionArgs) {
  if (!verifyAuth(request)) {
    return new Response("Unauthorized", { status: 401 });
  }

  const responseHeaders = new Headers({
    "Content-Type": "text/event-stream",
    "Cache-Control": "no-cache",
    Connection: "keep-alive",
    "Access-Control-Allow-Origin": "*",
    "Access-Control-Allow-Headers": "Authorization, Content-Type",
  });

  const server = createMCPServer();

  const transport = new SSEServerTransport({
    endpoint: "/api/mcp",
    requestHeaders: Object.fromEntries(request.headers.entries()),
    responseHeaders: Object.fromEntries(responseHeaders.entries()),
  });

  const stream = new ReadableStream({
    async start(controller) {
      try {
        await server.connect(transport);

        // Keep connection alive (SSE connections timeout after 30 seconds of silence)
        const keepAlive = setInterval(() => {
          controller.enqueue(new TextEncoder().encode(": keepalive\n\n"));
        }, 30000);

        request.signal.addEventListener("abort", () => {
          clearInterval(keepAlive);
          controller.close();
        });
      } catch (error) {
        controller.error(error);
      }
    },
  });

  return new Response(stream, {
    headers: responseHeaders,
  });
}

Step 4: Configure Environment Variables

Add to your .env:

ANTHROPIC_API_KEY=your-anthropic-api-key
MCP_API_KEY=a-secret-key-for-your-mcp

The ANTHROPIC_API_KEY lets your server call Claude's API to generate monologues. The MCP_API_KEY is your own secret, it's what Claude will use to authenticate with your server.

Step 5: Deploy and Connect

Deploy your changes (I use Vercel, but any platform works):

git add .
git commit -m "Add MCP server for internal monologues"
git push

Then connect from Claude:

claude mcp add --transport sse monologue https://yourdomain.com/api/mcp --header "Authorization: Bearer your-secret-key"

The sse transport tells Claude to use Server Sent Events (the streaming connection type we set up). Replace your-secret-key with the same MCP_API_KEY from your .env file.

How It Works in Practice

Now, when working in Claude:

> internal monologue 150 lines - design user experience for login flow

Claude detects the phrase, calls the MCP tool, generates the detailed monologue (e.g., a debate on intuitive interfaces vs secure processes, navigation logic, etc.), and uses it to design the feature thoughtfully.

A sample monologue excerpt:

1. [Creative] Login flow should be innovative and seamless – perhaps biometric integration for delight?
2. [Pragmatic] Biometrics add complexity; focus on reliable password handling in auth.js first.
3. [Creative] But user experience suffers with forms – question if we can animate transitions smoothly.
4. [Pragmatic] Animations might impact performance on mobile; consider edge cases in responsive design.
...

Why This Matters

This MCP setup boosts programming efficiency by leveraging AI tools for consistent planning and productivity gains, while experimenting with a non typical application to explore MCPs more creatively and deeply.

What's Next?

One could build other creative tools, such as one that fetches and analyzes server logs directly, or another that integrates with external APIs for real time data checks.

Your Turn

What repetitive tasks do you deal with in your daily work? Maybe you can create an MCP. The code is ready to adapt and build something.

Questions? Leave a comment below and I'll be happy to help!

A2A - Understanding the Basics and Building Multi-Agent Flight Management System

Eze Quiroga — Wed, 09 Jul 2025 03:33:15 +0000

🌟 Introduction

Continuing with the context I shared in the previous article -MCP - Understanding the Basics and Building a Research Paper Management Chatbot-, where I spotted the fact there's been a growing need for a standard way to enable communication between agents and give them richer context to handle complex tasks through natural language, it's time to explore how to communicate agents or even complete agentic systems in a standard way.

That's where Google's A2A (Agent-to-Agent) protocol comes in.
Announced by Google on April 9, 2025, this emerging protocol standardizes how AI agents communicate with each other, enabling them to share context, delegate tasks, and collaborate on complex objectives that require multiple specialized capabilities.

In this post, we'll walk through building a command-line multi-agent system using the A2A protocol. We'll learn how to:

Create A2A agents with their cards and skills
Configure how agents will return information
Use a centralized LangChain ReAct agent to call A2A agents

By the end, our chatbot will be able to:

(employee_flight_request_agent) Know the status of corporate flight orders (pending purchase, purchased, and associated with a specific person)
(airport_knowledge_base_agent) Obtain information about airports and cities
(flight_search_agent) Search for real flight information departing from a specific airport
Recommend airports for flights pending acquisition

Here's how we'll break it down:

Context
Local environment setup
What is A2A?
Core components
Communicating agents
Building A2A Agents
Our chatbot
Running our chatbot
Key features
Final thoughts
Resources

Let's get started! 🚀

Important note: Only the most relevant function signatures and docstrings are shown in this post. You can find the full implementation in ezequiroga/a2a-bases.

🤝 Context

The main objective of the project is to recommend departure airports for corporate flights. For this, we will create a chatbot and three A2A agents:

employee_flight_request_agent: Manages employee flight requests and booking status using an internal database. It returns results immediately: receives requests → processes → returns results.
airport_knowledge_base_agent: Acts as a knowledge database that provides airport information and city-airport mappings. Since the main purpose of this article is to explore A2A, this agent uses fuzzy matching to retrieve information. It uses streaming to return its results.
flight_search_agent: Performs real-time flight search using external aviation data from the Aviation Stack API. This agent uses a ReAct Agent from LangChain to create filters for the tool that interacts with the Aviation Stack API. It responds to requests by sending push notifications.

These three agents will be called through the chatbot, which uses a ReAct Agent from LangChain to interact with the user and decide which agent should be called.

Each agent uses a different communication method with our chatbot, so our chatbot needs to adapt to each of them.

🛠️ Local environment

🐍 Python 3.13.5

Install the required packages from requirements.txt using the uv add -r requirements.txt command or pip install -r requirements.txt.

IMPORTANT NOTE: The A2A protocol library must be installed using UV to avoid installation errors. This is the recommended approach according to the official A2A documentation. Using pip may result in dependency conflicts or incomplete installations.

Pro tip: Use Python virtual environments for cleaner dependency management.

🤔 What Is Agent2Agent (A2A)?

Let's explore what the A2A protocol is and how it enables seamless agent-to-agent communication. For more details, check out the Resources section at the end.

The A2A protocol was created by Google with the goal of standardizing and simplifying both communication and interoperability between AI Agents or even complete Agentic Systems.

As the official documentation states, A2A's key goals are:

Interoperability: Bridge the communication gap between disparate agentic systems.
Collaboration: Enable agents to delegate tasks, exchange context, and work together on complex user requests.
Discovery: Allow agents to dynamically find and understand the capabilities of other agents.
Flexibility: Support various interaction modes including synchronous request/response, streaming for real-time updates, and asynchronous push notifications for long-running tasks.
Security: Facilitate secure communication patterns suitable for enterprise environments, relying on standard web security practices.
Asynchronicity: Natively support long-running tasks and interactions that may involve human-in-the-loop scenarios.

The communication is based on HTTP(S) as the transport protocol and defines that each server exposes its services through a URL included in its AgentCard. All data exchange is based on JSON-RPC 2.0, ensuring that requests and responses follow a consistent and standard format, always with Content-Type: application/json.

And, for real-time updates, A2A supports streaming using Server-Sent Events (SSE). In these cases, the server returns continuous events with embedded JSON-RPC responses, allowing agents to maintain open communication flows for long-duration messages or tasks.

Official SDK

The official SDK allows us to abstract away from writing JSON code by using classes and methods that facilitate communication. The recommended way to install the SDK is using UV by running the following:

uv add a2a-sdk

Important note: You need to initialize the uv project since our A2A servers will run using uv

🧩 Core Components

A2A communication is built around several key components that define the message structure required for proper agent interaction.

A2A Client: An application or agent that initiates requests to an A2A Server on behalf of a user or another system.
A2A Server (Remote Agent): An agent or agentic system that exposes an A2A-compliant HTTP endpoint, processing tasks and providing responses.
Agent Card: A JSON metadata document published by an A2A Server, describing its identity, capabilities, skills, service endpoint, and authentication requirements.
Task: The fundamental unit of work managed by A2A, identified by a unique ID. Tasks are stateful and progress through a defined lifecycle.
Message: A communication turn between a client and a remote agent, having a role ("user" or "agent") and containing one or more Parts.
Part: The smallest unit of content within a Message or Artifact (e.g., TextPart, FilePart, DataPart).
Artifact: An output (e.g., a document, image, structured data) generated by the agent as a result of a task, composed of Parts.

Important Note: The protocol is based on JSON-RPC 2.0, which means all messages are sent in JSON format. To simplify development, we will use the official SDK.

📡 Communicating Agents

A2A specifies three different communication patterns for A2A Servers to interact with A2A Clients:

Standard HTTP(S) Communication: The client sends a request and the server sends a response, completing the standard HTTP(S) protocol cycle
Streaming (SSE): Real-time, incremental updates for tasks (status changes, artifact chunks) delivered via Server-Sent Events
Push Notifications: Asynchronous task updates delivered via server-initiated HTTP POST requests to a client-provided webhook URL, for long-running or disconnected scenarios

You can see a communication sequence diagram in A2A Request Lifecycle and dive deeper into communication methods in Streaming & Asynchronous Operations in A2A.

The Agent's communication method is defined in its Agent Card. Our system has three A2A agents, each using a different communication approach. Let's go ahead and start creating our agents.

🏗️ Building A2A Agents

The first step in creating an A2A agent is defining its Agent Card. This card is essential as it describes the server's identity, capabilities, skills, service endpoint URL, and authentication requirements. Clients use the information in the Agent Card to understand how to interact with the agent.

Agent Card

As described previously, the Agent Card is a JSON metadata document published by the A2A Server, describing its identity, capabilities, skills, service endpoint, authentication and how clients should interact with it.

The recommended location for the Agent Card, following the well-known URI strategy, is http(s)://{server_domain}/.well-known/agent.json. Using the official SDK, the Agent Card will be available at that path automatically. Below, you can see the Agent Cards for each of our A2A Agents.

Agent Card: Employee Flight Request -> This card defines that our agent will respond to every request immediately. It also specifies that the agent has three skills: list_pending_requests_skill, list_booked_requests_skill and check_employee_request_skill. The protocol does not specify how the agent knows which skill should be performed upon a request - it's the agent's responsibility to determine which skill to execute.

public_agent_card = AgentCard(
    name='Employee Flight Request Management Agent',
    description='Agent for managing and checking employee flight requests and bookings',
    url='http://localhost:9992/',
    version='1.0.0',
    defaultInputModes=['text'],
    defaultOutputModes=['text'],
    capabilities=AgentCapabilities(streaming=False),
    skills=[
        list_pending_requests_skill,
        list_booked_requests_skill,
        check_employee_request_skill
    ],
    supportsAuthenticatedExtendedCard=False,
)

Agent Card: Airport Knowledge Base -> Here, the card specifies that the agent will stream the response to the client using streaming=True.

public_agent_card = AgentCard(
    name='Airport Knowledge Base Agent',
    description='Knowledge base agent for retrieving correct airport names and city-airport mappings',
    url='http://localhost:9991/',
    version='1.0.0',
    defaultInputModes=['text'],
    defaultOutputModes=['text'],
    capabilities=AgentCapabilities(streaming=True),
    skills=[airport_knowledge_skill],
)

Agent Card: Flight Search -> This agent, as its card describes by pushNotifications=True, will send push notifications to the clients.

public_agent_card = AgentCard(
    name='Flight Search Agent',
    description='Real-time flight search agent with push notification capabilities for aviation data',
    url='http://localhost:9993/',
    version='1.0.0',
    defaultInputModes=['text'],
    defaultOutputModes=['text'],
    capabilities=AgentCapabilities(streaming=True, pushNotifications=True),
    skills=[flight_search_skill],
    supportsAuthenticatedExtendedCard=False,
)

NOTE: Since the main purpose of this article and the project that implements the code explained here is to demonstrate the A2A Protocol, our chatbot knows beforehand how each agent will send the responses. However, in a real scenario, the client may need to implement a way to handle communications based on the agent cards.

Agent Skills

An Agent Skill is a specific capability, function, or area of expertise the agent can perform or address. An agent can define more than one skill - as our Employee Flight Request Management Agent does - in its card. Nevertheless, the protocol says nothing about how the agent knows which skill the user is trying to execute. Thus, it is the responsibility of the agent to determine which skill to perform based on the client's message.

Below is the definition of one skill used by the Employee Flight Request Management Agent - the other skills are defined in a similar fashion.

list_pending_requests_skill = AgentSkill(
    id='list_pending_requests',
    name='List Pending Flight Requests',
    description='List all employee flight requests that are not yet booked',
    tags=['flight', 'requests', 'pending', 'left', 'available', 'not booked', 'employee'],
    examples=[
        'list pending flight requests',
        'show pending requests',
        'which flights are not booked',
        'display remaining requests'
    ],
)

Agent Executor

The Agent Executor is the central component that handles the processing logic of A2A agents and is responsible for processing incoming requests and generating corresponding responses. The SDK provides an abstract base class a2a.server.agent_execution.AgentExecutor that we must implement to create our agent. This class defines two main methods:

async def execute(self, context: RequestContext, event_queue: EventQueue): Handles incoming requests that expect a response or a stream of events.
async def cancel(self, context: RequestContext, event_queue: EventQueue): Handles requests to cancel an ongoing task.

The RequestContext provides information about the incoming request, and the EventQueue is used to send events back to the client.

This is the Agent Executor implementation for our Employee Flight Request Management Agent:

class EmployeeFlightRequestAgentExecutor(AgentExecutor):
    """Employee flight request management agent executor."""

    def __init__(self):
        self.agent = EmployeeFlightRequestAgent()

    async def execute(
        self,
        context: RequestContext,
        event_queue: EventQueue,
    ) -> None:
        query = context.get_user_input()

        response = await self.agent.invoke(query)

        await event_queue.enqueue_event(new_agent_text_message(response))

    async def cancel(
        self, context: RequestContext, event_queue: EventQueue
    ) -> None:
        await event_queue.enqueue_event(new_agent_text_message("❌ Flight request operation cancelled"))

This line response = await self.agent.invoke(query) calls and executes the actual logic of our agent, querying the mocked database and returning the data.

Notice the line await event_queue.enqueue_event(new_agent_text_message(response)). This is really important because it's how the protocol allows the server to respond to the clients. The event_queue.enqueue_event is the way to return messages even if stream is False in the Agent Card.

Creating and sending messages

In this section we will explore how to create messages and send them to the clients. In the section Our chatbot we will describe how it handles each kind of communication.

The simplest way to create a message to send to an A2A Client is using a2a.utils.new_agent_text_message(text: str, context_id: str | None = None, task_id: str | None = None) -> Message:). This function returns the following object:

return Message(
    role=Role.agent,
    parts=[Part(root=TextPart(text=text))],
    messageId=str(uuid.uuid4()),
    taskId=task_id,
    contextId=context_id,
)

Our Employee Flight Request Management Agent uses this method to create messages in response to the client. The created message is sent using the event_queue.enqueue_event method. See the code below.

class EmployeeFlightRequestAgentExecutor(AgentExecutor):

    ...

    async def execute(
        self,
        context: RequestContext,
        event_queue: EventQueue,
    ) -> None:
        query = context.get_user_input()

        response = await self.agent.invoke(query)

        await event_queue.enqueue_event(new_agent_text_message(response))

Our next agent, Airport Knowledge Base, streams messages to the clients. To achieve this, we need to use another class provided by the SDK: a2a.server.tasks.TaskUpdater. This class allows agents to publish updates to a task's event queue. Based on this, the messages to stream must contain the task.id and task.contextId.

This is the Agent Executor for that agent:

class AirportKnowledgeBaseAgentExecutor(AgentExecutor):
    """Airport knowledge base agent executor."""

    def __init__(self):
        self.agent = AirportKnowledgeBaseAgent()

    async def execute(
        self,
        context: RequestContext,
        event_queue: EventQueue,
    ) -> None:
        query = context.get_user_input()
        task = context.current_task
        if not task:
            task = new_task(context.message)

        updater = TaskUpdater(event_queue, task.id, task.contextId)
        await self.agent.invoke(task, updater, query)

Note the creation of the TaskUpdater instance: it takes the event_queue, the task id and the context id from the task. Then, within the method self.agent.invoke(...) we use the updater object to stream messages as follows:

async def invoke(self, task: Task, updater: TaskUpdater, query: str = None) -> None:
    """
    Retrieve airport information using fuzzy matching by name and municipality.

    Args:
        context: Request context
        event_queue: Event queue for streaming messages
        query: Search string (city or airport name)

    Returns:
        String with top 5 airport names and top 5 cities with their airports
    """
    if self.airport_knowledge.empty:
        await updater.update_status(
            TaskState.failed,
            new_agent_text_message(
                "Airport knowledge base not loaded. Please check the database files.",
                task.contextId,
                task.id,
            ),
            final=True
        )
        return

    await updater.update_status(
        TaskState.working,
        new_agent_text_message(
            "📚 Accessing airport knowledge base...",
            task.contextId,
            task.id,
        ),
    )

    ...

    part = TextPart(text=result_lines)
    message = Message(
        role=Role.agent,
        parts=[part],
        messageId=str(uuid.uuid4()),
    )
    await updater.complete(message=message)

Throughout this code we can also introduce some useful classes from the SDK (package a2a.types):

TaskState: enum representing the possible states of a Task
Role: enum representing a message sender's role
TextPart: represents a text segment within parts
Message: class that represents a single message exchanged between user and agent.

So far we've seen how agents respond in a single timeline: from receiving the request until sending their response, either through a single message or by streaming multiple messages until completing the cycle.

But if a task could take a long time to finish, it's not a good idea to make the client wait until the end while keeping a connection alive. For these cases, we can use push notifications. The last agent we will create does exactly this.

As a requirement imposed by the protocol, a client that wants to receive push notifications should explicitly specify the endpoint enabled for that purpose. We will see how to do that in the section Our chatbot. Here, we describe how the push notification process works.

In the section Building A2A Servers we will see that we need to use a request_handler for creating an A2A Server. The SDK provides us with the following implementation: a2a.server.request_handlers.DefaultRequestHandler. At the time this article was written (Jul 2025), that handler does not properly manage push notifications. Therefore, you can extend that class and override the methods within it. The class CustomRequestHandler does exactly that by overriding only one method: on_message_send_stream.

These are the most relevant parts of the new implementation

class CustomRequestHandler(DefaultRequestHandler):
    """Custom request handler that extends DefaultRequestHandler.

    This handler maintains all default functionality while providing
    custom implementation for the streaming message send method.
    """

    async def on_message_send_stream(
        self,
        params: MessageSendParams,
        context: ServerCallContext | None = None,
    ) -> AsyncGenerator[Event]:
        """Custom handler for 'message/stream' (streaming).

        Starts the agent execution and yields events as they are produced
        by the agent.
        """
        task_manager = TaskManager(
            task_id=params.message.taskId,
            context_id=params.message.contextId,
            task_store=self.task_store,
            initial_message=params.message,
        )
        # Start new code #
        task = Task(
            id=params.message.taskId,
            contextId=params.message.contextId,
            status=TaskStatus(
                state=TaskState.submitted,
            )
        )
        task = await task_manager.save_task_event(task)
        task: Task | None = await task_manager.get_task()
        # End new code #

        if task:
            task = task_manager.update_with_message(params.message, task)

            if self.should_add_push_info(params):
                assert isinstance(self._push_notifier, PushNotifier)
                assert isinstance(
                    params.configuration, MessageSendConfiguration
                )
                assert isinstance(
                    params.configuration.pushNotificationConfig,
                    PushNotificationConfig,
                )
                await self._push_notifier.set_info(
                    task.id, params.configuration.pushNotificationConfig
                )
        else:
            queue = EventQueue()

        ...

        try:
            ...
            async for event in result_aggregator.consume_and_emit(consumer):
                ...
                if self._push_notifier and task_id:
                    latest_task = await result_aggregator.current_result
                    if isinstance(latest_task, Task):
                        await self._push_notifier.send_notification(latest_task)
                yield event
        except Exception as e:
            print(f"❌ {e}")
        finally:
            await self._cleanup_producer(producer_task, task_id)

Moreover, to enable the ability to send push notifications in our agent, we need to add push_notifier=InMemoryPushNotifier(httpx_client=httpx.AsyncClient()) when creating the request_handler for building the A2A Server. This is shown again in the section Running A2A servers.

That's all. If pushNotifications=True in the Agent Card, the PushNotifier is set in the request_handler and the client provides the push notification endpoint, the SDK will automatically send messages to the provided endpoint each time a Task changes its status. It's important to mention that an instance of the class Task is being pushed. This is relevant because the endpoint our chatbot exposes for listening to notifications will receive that object as JSON and should be able to parse it.

Because of this, the Agent Executor of our agent simply sends messages using the updater as follows:

class FlightSearchAgentExecutor(AgentExecutor):
    """Flight search agent executor with ReAct capabilities."""

    def __init__(self):
        self.agent = FlightSearchAgent()

    async def execute(
        self,
        context: RequestContext,
        event_queue: EventQueue,
    ) -> None:
        """Execute flight search request."""
        query = context.get_user_input()

        updater = TaskUpdater(event_queue, context.current_task.id, context.current_task.contextId)
        await updater.update_status(
            TaskState.submitted,
            new_agent_text_message(
                "🤖 Flight Search Agent activated...",
                context.current_task.contextId,
                context.current_task.id
            )
        )

        message = await self.agent.invoke(context.current_task.id, context.current_task.contextId, query)

        await updater.update_status(TaskState.working, message)
        await updater.update_status(TaskState.completed)

The method self.agent.invoke returns the following object:

push_notification_payload = {
    "flights": final_response,
    "source": "flight_search_agent",
    "metadata": {"task_id": task_id, "context_id": context_id}
}

part = TextPart(text=json.dumps(push_notification_payload))
message = Message(
    role=Role.agent,
    parts=[part],
    messageId=str(uuid.uuid4()),
    taskId=task_id,
    contextId=context_id,
)
return message

Building A2A Servers

At this point we have already achieved:

1️⃣ Creating agent cards specifying their endpoint (url: where the A2A service can be reached), skills (skills) and capabilities (capabilities: communication methods) among other properties

2️⃣ Adding agent skills to the agent cards

3️⃣ Implementing the agent executor for each agent

4️⃣ Composing and sending messages depending on the agent's communication method

Now, it's time to build the A2A servers. For this purpose, the SDK provides us with the class a2a.server.apps.A2AStarletteApplication.

To create an A2A Server, it's mandatory to use an AgentCard, a RequestHandler (to route incoming A2A RPC calls to the appropriate methods on your executor), an AgentExecutor (to execute the core logic of how agents process requests and generate responses) and a TaskStore (to manage the lifecycle of tasks).

As can be seen in the code below, the SDK provides useful implementations of these classes.

request_handler = DefaultRequestHandler(
    agent_executor=EmployeeFlightRequestAgentExecutor(),
    task_store=InMemoryTaskStore(),
)

server = A2AStarletteApplication(
    agent_card=public_agent_card,
    http_handler=request_handler,
)

uvicorn.run(server.build(), host='0.0.0.0', port=9992)

Note: Since Starlette Application and uvicorn are beyond the scope of this article, if you need more information, you can read about them in Starlette and uvicorn respectively.

Running A2A servers

The recommended directory structure for the server is as follows:

├── name_of_the_agent/
│   ├── __main__.py -> contains AgentSkill, AgentCard, CustomRequestHandler, A2AStarletteApplication and the line to start the uvicorn server
│   └── agent_executor.py -> the Agent Executor implementation with the actual logic for processing incoming requests and generating responses
│   └── ...
│   └── subdirectories/
│       └── ...

In the previous sections we already showed how to implement the Agent Executor. Below, you can see an example of the __main__.py file:

# ...IMPORTS...

if __name__ == '__main__':

    flight_search_skill = AgentSkill(
        ...
    )

    public_agent_card = AgentCard(
        ...
    )

    request_handler = CustomRequestHandler(
        agent_executor=FlightSearchAgentExecutor(),
        task_store=InMemoryTaskStore(),
        push_notifier=InMemoryPushNotifier(httpx_client=httpx.AsyncClient()) # ONLY needed if the Agent will send push notifications
    )

    server = A2AStarletteApplication(
        agent_card=public_agent_card,
        http_handler=request_handler,
    )

    uvicorn.run(server.build(), host='0.0.0.0', port=9993)

With all of that in place, we can run our A2A Server by:

cd name_of_the_agent/
uv run . --host 0.0.0.0

Important Note: To run the server this way, you need to initialize a uv project within the agent's folder. You can find more information in UV Projects.

🤖 Our chatbot

The final piece of our AI system is our chatbot: the entry point for user interaction. Our chatbot has five important components that we'll analyze below and defines the main function that contains the logic to allow users to enter prompts and display responses from our A2A Agents.

The *Tool(BaseTool) classes implement langchain.tools.BaseTool since they are tools that our LangChain Agent can invoke. They all override the methods def _run(self, query: str) -> str and async def _arun(self, query: str) -> str with the actual logic of the tool.

Note: The chat_agent.py file needs to be refactored to move the classes declared in it to independent files.

A2AAgentRegistry

A mock registry that manages A2A agent information, including their capabilities, base URLs, and initialized clients for communication.

To obtain the information present in the Agent Cards, we use a2a.client.A2ACardResolver which automatically calls the endpoint http(s)://{server_domain}/.well-known/agent.json and initiate communication with our A2A Agents using a2a.client.A2AClient.

class A2AAgentRegistry:
    """Mock registry for A2A agents. Later will be replaced with real discovery."""

    def __init__(self):
        self.agents = {
            "airport_knowledge_base": {
                ...
                "base_url": "http://localhost:9991",
                ..
            },
            "employee_flight_requests": {
                ...
                "base_url": "http://localhost:9992",
                ..
            },
            "flight_search": {
                ...
                "base_url": "http://localhost:9993",
                ...
            }
        }

    async def initialize_agents(self, httpx_client: httpx.AsyncClient):
        """Initialize A2A clients for all registered agents."""
        for _, agent_info in self.agents.items():
            try:
                resolver = A2ACardResolver(
                    httpx_client=httpx_client,
                    base_url=agent_info["base_url"]
                )

                try:
                    card: AgentCard = await resolver.get_agent_card()
                    client = A2AClient(httpx_client=httpx_client, agent_card=card)

                    agent_info["card"] = card
                    agent_info["client"] = client
                    print(f"✅ Initialized {agent_info['name']} at {agent_info['base_url']}")
                    print(f"   📝 Description: {agent_info['description']}")

                except Exception as e:
                    print(f"⚠️  Could not connect to {agent_info['name']} at {agent_info['base_url']}: {e}")
                    agent_info["card"] = None
                    agent_info["client"] = None

            except Exception as e:
                print(f"❌ Failed to initialize {agent_info['name']}: {e}")

    def get_agent(self, agent_id: str) -> Optional[Dict[str, Any]]:
        """Get agent info by ID."""
        return self.agents.get(agent_id)

    def list_available_agents(self) -> List[str]:
        """List all agents that are available (have active clients)."""
        return [
            agent_id for agent_id, info in self.agents.items() 
            if info["client"] is not None
        ]

EmployeeFlightRequestTool

A LangChain tool that checks the status of employee flight requests and booking information by communicating with the employee flight request agent. Our A2A Agent Employee Flight Request Management Agent uses capabilities=AgentCapabilities(streaming=False), so the message is sent and the response is awaited.

class EmployeeFlightRequestTool(BaseTool):
    """
    ...
    """

    ...

    def __init__(self, agent_registry: A2AAgentRegistry):
        super().__init__(agent_registry=agent_registry)

    async def _arun(self, query: str) -> str:
        """Async implementation to call the employee flight request agent."""
        agent_info = self.agent_registry.get_agent("employee_flight_requests")

        if not agent_info or not agent_info["client"]:
            return "❌ Employee flight request agent is not available. Please check if the service is running."

        try:
            part = TextPart(text=query)
            message = Message(
                role=Role.user,
                parts=[part],
                messageId=str(uuid4()),
            )

            send_message_payload = MessageSendParams(message=message)

            request = SendMessageRequest(
                id=str(uuid4()), 
                params=send_message_payload,
            )

            print(f"\n📋 Checking flight requests for: {query}")
            client = agent_info["client"]
            return await client.send_message(request)

        except Exception as e:
            return f"❌ Error calling employee flight request agent: {str(e)}"
    ...

AirportKnowledgeTool

A LangChain tool that retrieves airport information from the knowledge base agent when users ask about airport names or airports in specific cities. Our A2A Agent Airport Knowledge Base Agent uses capabilities=AgentCapabilities(streaming=True), so this tool must receive a stream of messages.

class AirportKnowledgeTool(BaseTool):
    """
    ...
    """

    ...

    def __init__(self, agent_registry: A2AAgentRegistry):
        super().__init__(agent_registry=agent_registry)

    async def _arun(self, query: str) -> str:
        """Async implementation to call the airport knowledge base agent."""
        agent_info = self.agent_registry.get_agent("airport_knowledge_base")

        if not agent_info or not agent_info["client"]:
            return "❌ Airport knowledge base agent is not available. Please check if the service is running."

        try:
            part = TextPart(text=query)
            message = Message(
                role=Role.user,
                parts=[part],
                messageId=str(uuid4()),
            )

            streaming_request = SendStreamingMessageRequest(
                id=str(uuid4()), 
                params=MessageSendParams(message=message)
            )

            client = agent_info["client"]
            stream_response = client.send_message_streaming(streaming_request)

            full_response = ""
            print(f"\n📚 Looking up airport information for: {query}")

            async for chunk in stream_response:
                json_chunk = chunk.model_dump(mode='json', exclude_none=True)
                if json_chunk['result']['status']['state'] == TaskState.completed:
                    full_response = f"\n✅ Knowledge base lookup completed\n{json_chunk['result']['status']['message']['parts'][0]['text']}\n"
                    break
                else:
                    print(f"📨 {json_chunk['result']['status']['message']['parts'][0]['text']}\n")

            return full_response if full_response else "✅ Knowledge base lookup completed - check the streaming output above."

        except Exception as e:
            return f"❌ Error calling airport knowledge base agent: {str(e)}"
    ...

FlightSearchTool

A LangChain tool that searches for scheduled flights using Aviation Stack API through the flight search agent and handles results via push notifications since our A2A Agent Flight Search Agent uses capabilities=AgentCapabilities(streaming=True, pushNotifications=True).

It's important to note how the message is created when using a2a.types.SendStreamingMessageRequest since in this case we need to tell our A2A Agent which endpoint we make available to receive the push notifications.

class FlightSearchTool(BaseTool):
    """
    ...
    """

    ...
    flight_search_callback_url: str = f"http://localhost:{HTTP_SERVER_PORT}{FLIGHTS_ENDPOINT_PATH}" # TODO: DO NOT hardcode the callback URL

    def __init__(self, agent_registry: A2AAgentRegistry):
        super().__init__(agent_registry=agent_registry)

    async def _arun(self, query: str) -> str:
        """Async implementation to call the flight search agent."""
        agent_info = self.agent_registry.get_agent("flight_search")

        if not agent_info or not agent_info["client"]:
            return "❌ Flight search agent is not available. Please check if the service is running."

        async def async_search():
            """Execute flight search asynchronously."""
            try:
                client: A2AClient = agent_info["client"]

                part = TextPart(text=query)
                message = Message(
                    role=Role.user,
                    parts=[part],
                    messageId=str(uuid4()),
                    contextId=str(uuid4()),
                    taskId=str(uuid4())
                )

                request = SendStreamingMessageRequest(
                    id=str(uuid4()),
                    params=MessageSendParams(
                        message=message,
                        configuration=MessageSendConfiguration(
                            acceptedOutputModes=["text"],
                            pushNotificationConfig=PushNotificationConfig(
                                url=self.flight_search_callback_url
                            )
                        )
                    )
                )

                response = client.send_message_streaming(request=request)

                async for chunk in response:
                    pass # we don't care about the stream since messages will be received through the exposed endpoint

            except Exception as e:
                print(f"❌ Error in background flight search: {str(e)}")

        asyncio.create_task(async_search()) # here we decouple this tool's execution from the LangChain flow

        print(f"🛫 Flight search initiated in background for: {query}")

        return "✅ Flight search initiated - results will be sent via push notification once completed"

ReactChatAgent

A LangGraph 'ReAct' Agent that orchestrates interactions between the user and our A2A agents through specialized tools. The most notable features are:

Creates a FastAPI server that exposes an endpoint to receive push notifications
Uses langgraph.checkpoint.memory.MemorySaver so our agent has short-term memory
Defines the setup_http_endpoints(...) method to listen and process push notifications sent by the A2A Agent
Uses the langgraph.prebuilt.create_react_agent function -ReAct Agents are deprecated so this function creates a Graph that calls tools in a loop until a stopping condition is met- to create a Compiled Graph from LangChain that acts as the "brain" of our chat
Defines the chat(self, user_input: str, thread_id: str = "default") -> str: method that makes the call to the LLM

class ReactChatAgent:
    """LangGraph ReAct agent that can interact with A2A agents through tools and receive external messages via HTTP."""

    def __init__(self):
        self.agent_registry = A2AAgentRegistry()

        api_key = os.getenv("ANTHROPIC_API_KEY")
        if not api_key:
            raise ValueError("ANTHROPIC_API_KEY environment variable must be set")

        self.model = ChatAnthropic(
            model="claude-3-5-sonnet-20241022",
            temperature=0,
            api_key=api_key
        )

        self.memory = MemorySaver()

        self.agent_graph = None

        self.external_message_queue = Queue()

        self.app = FastAPI(title="ReAct Chat Agent API", version="1.0.0")
        self.setup_http_endpoints()

    def setup_http_endpoints(self):
        """Setup HTTP endpoints for receiving external messages."""

        @self.app.post(FLIGHTS_ENDPOINT_PATH)
        async def receive_flight_findings(flight_finding: Task):
            """Receive flight findings and add them to the message queue."""
            try:
                # full implementation in the GitHub repository

            except Exception as e:
                raise HTTPException(status_code=500, detail=f"Error processing flight findings: {str(e)}")
        ...

    async def initialize(self, httpx_client: httpx.AsyncClient):
        """Initialize the agent and its tools."""
        print("🤖 Initializing LangGraph ReAct Chat Agent with Anthropic Claude...")

        await self.agent_registry.initialize_agents(httpx_client)

        tools = [
            AirportKnowledgeTool(self.agent_registry),
            EmployeeFlightRequestTool(self.agent_registry),
            FlightSearchTool(self.agent_registry)
        ]

        system_prompt = f"""You are a helpful assistant that manages employee flight requests in a corporate environment and can search for scheduled flights.
        ...
        """

        self.agent_graph = create_react_agent(
            model=self.model,
            tools=tools,
            checkpointer=self.memory,
            prompt=system_prompt
        )

        available_agents = self.agent_registry.list_available_agents()
        print(f"✅ LangGraph ReAct Agent initialized with {len(available_agents)} available A2A agents: {available_agents}")
        print("🧠 Using Anthropic Claude as the reasoning engine")
        print(f"📡 HTTP endpoint available at: http://localhost:{HTTP_SERVER_PORT}{FLIGHTS_ENDPOINT_PATH}")

    async def process_external_message(self, external_msg: InternalMessage, thread_id: str | None = None) -> str:
        """Process an external message and add it to agent memory."""
        # full implementation in the GitHub repository

    async def chat(self, user_input: str, thread_id: str = "default") -> str:
        ...
        try:
            config = {"configurable": {"thread_id": thread_id}}

            messages = [("user", user_input)]

            last_message = ""
            async for chunk in self.agent_graph.astream(
                {"messages": messages},
                config=config
            ):
                print(chunk)
                last_message = chunk

            return last_message if last_message else "Response completed - check the output above."

        except Exception as e:
            return f"❌ Error processing request: {str(e)}"

`main` function

When the function starts, it creates and initializes our ReactChatAgent() which in turn calls the Agent Registry and creates our A2A Clients to communicate with our A2A Servers (A2A Agents).

It also runs the FastAPI server in a separate execution thread, which allows listening to push notifications as they arrive.

The code inside the while loop is a bit complex since it processes the message queue from push notifications, interleaved with responses to user prompts, as they arrive.

async def main():
    """Main CLI loop for the chat agent with HTTP endpoint integration."""
    print("🚀 Starting LangGraph ReAct Chat Agent with A2A Integration")
    print("🧠 Powered by Anthropic Claude")
    print("📡 HTTP API Server Enabled")
    print("=" * 60)

    logging.basicConfig(level=logging.WARNING)

    async with httpx.AsyncClient() as httpx_client:
        agent = ReactChatAgent()
        await agent.initialize(httpx_client)

        print(f"🌐 Starting HTTP server on port {HTTP_SERVER_PORT}...")
        http_thread = threading.Thread(target=run_http_server, args=(agent,))
        http_thread.daemon = True
        http_thread.start()

        print("\n💬 Chat Agent Ready! (Type 'quit' to exit)")
        ...

        thread_id = "console_session_" + str(uuid4())[:8]
        prompt_shown = False
        should_exit = False

        while not should_exit:
            try:
                while not agent.external_message_queue.empty():
                    external_msg = agent.external_message_queue.get()
                    await agent.process_external_message(external_msg)
                    prompt_shown = False

                if not prompt_shown:
                    print("\n👤 You: ", end="", flush=True)
                    prompt_shown = True

                if sys.stdin in select.select([sys.stdin], [], [], 0.1)[0]:
                    user_input = input().strip()
                    prompt_shown = False

                    if user_input.lower() in ['quit', 'exit', 'bye']:
                        try:
                            loop = asyncio.get_event_loop()
                            if loop.is_closed():
                                print("⚠️  Event loop is closed - cleaning up gracefully...")
                            else:
                                print("✅ Event loop is healthy")
                        except RuntimeError:
                            print("ℹ️  No event loop available in current context")

                        print("👋 Goodbye!")
                        should_exit = True

                    if user_input:
                        print("🤖 LLM: ...", end="\n", flush=True)
                        response = await agent.chat(user_input, thread_id)

                        if isinstance(response, dict):
                            for chunk_type, chunk_data in response.items():
                                if isinstance(chunk_data, dict) and 'messages' in chunk_data:
                                    for message in chunk_data['messages']:
                                        if hasattr(message, 'content'):
                                            print(f"\n**** 🤖 Agent pretty print *****\n{message.content}\n" + "*" * 31)
                                            break

            except KeyboardInterrupt:
                print("\n👋 Goodbye!")
                should_exit = True
            except Exception as e:
                ...

        sys.exit(0)

Finally, to run our chatbot we execute:

python3 chat_agent.py

🚀 Running our chatbot

It's time to run our chatbot. But first, we need to start each of our A2A Agents.

# Start the airport knowledge base agent
cd airport_knowledge_base_agent/
uv run . --host 0.0.0.0 &

---
✅ Initialized flight request database with 10 records
INFO:     Started server process [53208]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9992 (Press CTRL+C to quit)
INFO:     127.0.0.1:60396 - "GET /.well-known/agent.json HTTP/1.1" 200 OK  # This shows that our chatbot called the agent card

# Start the employee flight requests agent 
cd ../employee_flight_requests_agent/
uv run . --host 0.0.0.0 &

---
✅ Loaded airport knowledge base: 8467 airports from 235 countries
INFO:     Started server process [53217]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9991 (Press CTRL+C to quit)
INFO:     127.0.0.1:60394 - "GET /.well-known/agent.json HTTP/1.1" 200 OK  # This shows that our chatbot called the agent card

# Start the flight search agent
cd ../flight_search_agent/
uv run . --host 0.0.0.0 &

---
✅ Initialized Flight Search ReAct Agent with Aviation Stack API
INFO:     Started server process [53228]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9993 (Press CTRL+C to quit)
INFO:     127.0.0.1:60398 - "GET /.well-known/agent.json HTTP/1.1" 200 OK  # This shows that our chatbot called the agent card

Now, we can run our chatbot:

python chat_agent.py

---
🚀 Starting LangGraph ReAct Chat Agent with A2A Integration
🧠 Powered by Anthropic Claude
📡 HTTP API Server Enabled
============================================================
🤖 Initializing LangGraph ReAct Chat Agent with Anthropic Claude...
✅ Initialized Airport Knowledge Base Agent at http://localhost:9991
   📝 Description: Knowledge base for airport information and city-airport mappings
✅ Initialized Employee Flight Request Agent at http://localhost:9992
   📝 Description: Check employee flight requests and booking status
✅ Initialized Flight Search Agent at http://localhost:9993
   📝 Description: Scheduled flight search using Aviation Stack
✅ LangGraph ReAct Agent initialized with 3 available A2A agents: ['airport_knowledge_base', 'employee_flight_requests', 'flight_search']
🧠 Using Anthropic Claude as the reasoning engine
📡 HTTP endpoint available at: http://localhost:9990/api/flights-findings
🌐 Starting HTTP server on port 9990...

💬 Chat Agent Ready! (Type 'quit' to exit)
You can ask about:
  - Airport knowledge base: 'find airports in Madrid'
  - Airport information: 'what airports are in Tokyo'
  - Employee flight requests: 'check pending flight requests'
  - Employee status: 'check John Smith flight request'
  - Flight search: 'search flights from AEP on 2025-11-20'
  - Real-time flights: 'find flights from JFK to LAX on 2025-12-01'

📡 HTTP Endpoints available:
  - POST http://localhost:9990/api/flights-findings
  - GET  http://localhost:9990/api/status
------------------------------------------------------------

👤 You:

First, we ask our agent "What are the pending flight requests?". Our agent will use the employee_flight_requests tool to perform the search and display the results.

👤 You: What are the pending flight requests?
🤖 LLM: ...
{'agent': {'messages': [AIMessage(content=[{'text': "I'll help you check the pending flight requests using the employee_flight_requests tool."...
...
**** 🤖 Agent pretty print *****
Based on the results, there are 5 pending flight requests awaiting booking:

1. Robert Johnson: New York to Los Angeles (Dec 1, 2025)
2. Anna Thompson: London to Dublin (Oct 5, 2025)
3. Sophie Martin: Paris to Rome (July 12, 2025)
4. Elena Popov: Berlin to Amsterdam (Nov 18, 2025)
5. Lisa Anderson: Sydney to Melbourne (Aug 25, 2025)

All these requests are currently in "Awaiting booking" status. Would you like more specific information about any of these requests or would you like to check the status of booked flights as well?
*******************************

Since our agent has memory, we can ask: "Which airports can Anna depart from?". In this case, we see that it uses the airport_knowledge_base tool.

👤 You: Which airports can Anna depart from?
🤖 LLM: ...
{'agent': {'messages': [AIMessage(content=[{'text': "Since Anna's request is for a flight from London, I'll use the airport_knowledge_base tool to find the available airports in London."...
...
**** 🤖 Agent pretty print *****
For Anna's flight from London to Dublin, she can depart from any of these major London airports in the UK:

1. London Heathrow Airport (LHR) - The largest and most well-connected airport
2. London Gatwick Airport (LGW) - Second largest airport
3. London Stansted Airport (STN) - Major hub for low-cost carriers
4. London City Airport (LCY) - Convenient for business travelers, located in the city
5. London Biggin Hill Airport (BQH) - Smaller airport primarily for private aviation
6. RAF Northolt (NHT) - Military airport with limited civilian use

The most commonly used airports for commercial flights to Dublin would be Heathrow (LHR), Gatwick (LGW), or Stansted (STN). Would you like me to search for specific flights from any of these airports to Dublin for Anna's travel date (October 5, 2025)?
*******************************

As the final step, we ask our agent: "Yes, what flights are available from Heathrow?". As we can see, this time it uses the flight_search tool.

👤 You: "Yes, what flights are available from Heathrow?"
🤖 LLM: ...
{'agent': {'messages': [AIMessage(content=[{'text': "I'll search for flights from London Heathrow (LHR) to Dublin (DUB) for Anna's travel date of October 5, 2025, using the flight_search tool."...
...
**** 🤖 Agent pretty print *****
I've initiated the flight search from London Heathrow (LHR) to Dublin (DUB) for October 5, 2025. The search has been started and the results will be sent via push notifications. Once we receive the results, you'll be able to see all available flights for that route and date, including:
- Flight numbers
- Departure and arrival times
- Airlines
- Aircraft types
- Terminal information

Please wait for the push notification with the detailed flight results, and then we can help select the most suitable flight for Anna's travel.
*******************************

Since the A2A Agent for flight search sends push notifications, our agent displays the messages as they arrive.

👤 You: 
🔍 External message

------------------------------------------------------------
🤖 Processing external message...
{'agent': {'messages': [AIMessage(content="I'll help summarize the flight findings received. ...
...
**** 🤖 Agent Response to External Message pretty print *****
I'll help summarize the flight findings received. These are flights departing from London Heathrow (LHR) Terminal 2 at 06:00. Here's a breakdown of the available routes:

1. London (LHR) to Zurich (ZRH):
- Swiss/Air Canada codeshare flight LX345/AC6756
- Departure: 06:00, Terminal 2, Gate A18
- Arrival: 08:40, Terminal 2
- Aircraft: Airbus A220-100

2. London (LHR) to Vienna (VIE):
- Austrian Airlines flight OS458 (codeshared by Air Canada, ANA, and Asiana)
- Departure: 06:00, Terminal 2
- Arrival: 09:10, Terminal 3
- Aircraft: Airbus A320-271N

3. London (LHR) to Lisbon (LIS):
- TAP Air Portugal flight TP1363 (codeshared by Air Canada, Azul, Air India, and Azores Airlines)
- Departure: 06:00, Terminal 2, Gate A17
- Arrival: 08:45, Terminal 1
- Aircraft: Airbus A320-251N

All flights depart at the same time (06:00) from different gates at Terminal 2. These are primarily operated by European carriers with various codeshare agreements with other airlines.
*************************************************************

We've achieved it! We've successfully searched for available flights for one of the requested trips. To exit our chatbot, we type quit.

👤 You: quit
✅ Event loop is healthy
👋 Goodbye!
🤖 LLM: ...
{'agent': {'messages': [AIMessage(content='Goodbye!...
...
**** 🤖 Agent pretty print *****
Goodbye! Let me know if you need any further assistance with flight requests, bookings, or airport information in the future.
*******************************

Note: If you're interested, you can see the entire output of the chatbot as well as the output of each of our agents in this document 👉 chatbot_responses.md.

✅ Key Features

Multi-Agent Communication: Seamless coordination between specialized A2A agents using different communication patterns (standard HTTP, streaming, and push notifications)
Protocol Standardization: Built on Google's A2A protocol ensuring interoperability and scalability across different agentic systems
Real-time Flight Data: Integration with Aviation Stack API for live flight information and airport recommendations
Smart Agent Orchestration: LangGraph ReAct Agent that intelligently routes user requests to the appropriate A2A agents
Flexible Communication Methods: Demonstrates all three A2A communication patterns in a single system
Corporate Flight Management: Complete workflow for managing employee flight requests from pending to booked status
Interactive Chat Interface: Command-line interface powered by Anthropic Claude for natural language interactions
Push Notification Support: Asynchronous task handling for long-running operations without blocking the user experience

💬 Final thoughts

This post demonstrates how the A2A protocol can be used to build sophisticated multi-agent systems that coordinate and collaborate effectively. By standardizing agent-to-agent communication, A2A opens up new possibilities for creating complex AI workflows where specialized agents can work together seamlessly.

The flight management system we built showcases the power of combining different communication patterns within a single application. From immediate responses for flight request status to streaming airport information and asynchronous flight searches, each agent operates optimally according to its specific requirements.

As AI systems continue to evolve toward more distributed and specialized architectures, protocols like A2A will become increasingly important for enabling the next generation of collaborative AI applications.

📚 Resources

Full code of this post 👉 ezequiroga/a2a-bases

Google A2A Official documentation 👉 A2A Protocol

Google A2A Protocol JSON Specification 👉 A2A Protocol Specification

A2A Protocol Documentation 👉 A2A Protocol Documentation

Agent2Agent (A2A) Python SDK Tutorial 👉 A2A Protocol Documentation

Google GitHub SDK examples repository 👉 A2A Python SDK

Google Python SDK Reference 👉 Python SDK Reference

Agent2Agent (A2A) Samples 👉 A2A Samples

Securing Microservices with JWT Validation at the Nginx Proxy Layer

martinfernandezcx — Wed, 28 May 2025 20:16:43 +0000

In a microservices architecture, separating concerns is critical for maintainability, scalability, and security. One key decision when building APIs is how and where to handle authentication. A common pattern is to delegate authentication to a dedicated authentication microservice, which issues tokens (e.g., JWTs), and use those tokens to access protected resources on independent backend APIs. When working on an infrastructure change, we faced the challenge of either integrating authentication in the Node.js backend (without the proper libraries) or maintaining a single backend solely for authorization.

The options we considered were:

Having the Go backend validate the token and proxy to the Node.js backend over authenticated routes. (We tried this, but the Go proxy became messy and difficult to maintain.)
Performing authentication in Node.js (infrastructure restrictions led us to abandon this approach.)
Implementing a different authentication method using the existing infrastructure

And this third one is what we came up with after investigating.

This post demonstrates how to validate JWT tokens directly in Nginx before routing requests to your protected Node.js API, centralizing authorization enforcement at the gateway layer.
This keeps the authentication within the infrastructure boundaries and allows us to simplify both the Go backend and the Node.js backend by relying on the NGINX layer.

Why JWT at the Proxy?

Decouples concerns: Authentication logic doesn't pollute your API code.
Consistent enforcement: All routes must pass the same token checks before hitting backend services.
Performance: Nginx (especially via OpenResty) is efficient and fast at handling token validation.

Options for JWT Validation

Validate JWT in each backend service
- Pros: Full control per service.
- Cons: Repeated logic, potential for inconsistency.
Use Nginx with a third-party JWT module
- Commercial option with NGINX Plus.
Use OpenResty (Nginx + Lua) with lua-resty-jwt
- Open-source, flexible, and efficient.

OpenResty + Lua

We use OpenResty and the lua-resty-jwt library to inspect JWTs in the Nginx layer. If valid, we forward requests to the backend. Otherwise, Nginx returns a 401 response.

Architecture

auth-api: issues JWTs via login endpoint.
node-api: protected and public routes.
nginx: gateway with Lua-based JWT validation.

Security Considerations

Some of these concerns were left out of this POC but we would like to mention for a proper production implementation. Please read through and evaluate wether it fits to your scenario or not.

Protection Against Common Attacks

Replay Attacks
- Implement token expiration (exp claim)
- Use short-lived tokens (15-60 minutes)
- Consider implementing a token blacklist for revoked tokens
- Use nonce values in token claims
- Implement request timestamp validation
Token Theft Prevention
- Always use HTTPS for token transmission
- Implement secure cookie attributes (HttpOnly, Secure, SameSite)
- Use token binding to prevent token reuse
- Implement rate limiting on authentication endpoints
- Monitor for suspicious patterns (multiple failed validations)

Token Expiration Best Practices

Short-lived Access Tokens
- Set expiration time between 15-60 minutes
- Use refresh tokens for longer sessions
- Implement sliding expiration for active users
Refresh Token Strategy
- Longer expiration (days/weeks)
- Store refresh tokens securely
- Implement refresh token rotation
- Maintain a refresh token family tree
Expiration Implementation

   -- Example of expiration check in Lua
   local jwt = require "resty.jwt"
   local validators = require "resty.jwt-validators"

   validators.set_system_leeway(0) -- Strict time validation
   validators.register_validator("exp", validators.opt_is_not_expired())

Grace Period Considerations
- Implement a small grace period (30 seconds) for clock skew
- Handle token expiration gracefully
- Provide clear error messages for expired tokens
- Implement automatic token refresh when possible

Project Layout

You can find the full source here:

GitHub Repo: martinfernandezcx/NGINXAUTH

How It Works

Client logs in via /api/auth/login, receives JWT.
Client sends Authorization: Bearer <token> on protected requests.
Nginx runs a Lua script to:
- Check token structure.
- Validate signature and expiration.
- Inject user ID into a request header.
Validated requests reach the Node.js service with identity attached.

Testing with Postman

The project includes a comprehensive Postman test suite to verify the JWT authentication flow and API endpoints. The test suite covers authentication, public routes, and protected routes with various scenarios.

Test Suite Structure

The Postman collection (postman/jwt-nginx-auth-tests.json) includes:

Authentication Tests
- Login endpoint validation
- Token format verification
- Automatic token storage for subsequent requests
Public Endpoint Tests
- Access to public routes
- Response format validation
Protected Endpoint Tests
- Access without token (401)
- Access with invalid token (401)
- Access with valid token (200)
- Response payload validation

Running the Tests

Prerequisites
- Install Postman
- Start the application:
```
 docker-compose up --build
```
Import the Collection
- Open Postman
- Click "Import" button
- Select the postman/jwt-nginx-auth-tests.json file
- select the postman\environment.json file
- The collection will be imported with all test cases
Run the Tests
- Select the "JWT Nginx Auth Tests" collection
- Click the "Run" button
- Postman will execute all tests in sequence
- View test results in the Postman console
Test Flow
- Tests run in a specific order to ensure proper token handling
- Login test stores the token for subsequent requests
- Protected route tests verify token validation
- Each test includes assertions for status codes and response formats

Test Cases break down

Login Test

   pm.test("Status code is 200", function () {
       pm.response.to.have.status(200);
   });
   pm.test("Response has token", function () {
       var jsonData = pm.response.json();
       pm.expect(jsonData).to.have.property('token');
   });

Protected Route Test

   pm.test("Status code is 200", function () {
       pm.response.to.have.status(200);
   });
   pm.test("Response contains protected data", function () {
       var jsonData = pm.response.json();
       pm.expect(jsonData).to.have.property('message');
   });

Environment Variables

The test suite uses Postman environment variables:

auth_token: Automatically set after successful login
Used in subsequent requests to protected routes

Continuous Integration

The Postman collection can be integrated into CI/CD pipelines using:

Newman CLI tool
Postman's CI/CD integrations
Custom test runners

Example Newman command:

newman run postman/jwt-nginx-auth-tests.json -e postman/environment.json

Running the tests

To run the tests you can use npm run test:postman:cli, or import both files on postman and run it there as mentioned above.

Conclusion

Centralizing JWT validation in the proxy simplifies backend services, enforces uniform security, and keeps authentication logic out of each microservice. This pattern is ideal for architectures using distinct auth and business logic APIs.

In contrast, validating tokens in the Node.js API itself might allow greater control over roles or context-based access logic but at the cost of duplication and potential inconsistency.

OpenResty strikes a solid balance between performance, flexibility, and maintainability in JWT-based authentication.

Apendix-A: Problems Found and Solutions

During the implementation of this JWT authentication system, we encountered several issues that required specific solutions:

OpenResty Dependencies
- Problem: Missing Perl and curl in the OpenResty Alpine image
- Solution: Added required packages in Dockerfile:
```
 RUN apk add --no-cache perl curl
```
Nginx User Configuration
- Problem: Missing nginx user in the container
- Solution: Created nginx user and group:
```
 RUN addgroup -S nginx && adduser -S -G nginx nginx
```
MIME Types Configuration
- Problem: Missing mime.types file in OpenResty Alpine image
- Solution: Created custom mime.types file and copied it to the correct location:
```
 COPY mime.types /etc/nginx/mime.types
```

Lua Package Path

Problem: Lua package path directive in wrong context
Solution: Moved lua_package_path to http context in nginx.conf:

 http {
     lua_package_path "/usr/local/openresty/lualib/?.lua;;";
     lua_package_cpath "/usr/local/openresty/lualib/?.so;;";
 }

Log Directory Permissions
- Problem: Nginx couldn't write to log directory
- Solution: Created log directory and set proper permissions:
```
 RUN mkdir -p /var/log/nginx && \
     chown -R nginx:nginx /var/log/nginx
```
Unit tests and routes issues
- Problem: Postman tests were failing with 404 on /protected
- Solution: Changed auth-api/index.js /login route and node-api/index.js /protected to /user

These solutions ensure proper functionality of the JWT authentication system while maintaining security and following best practices for containerized applications.

MCP - Understanding the Basics and Building a Research Paper Management Chatbot

Eze Quiroga — Fri, 23 May 2025 18:01:52 +0000

🌟 Introduction

Since early 2024, the use of AI Agents that can make autonomous decisions and leverage tools to respond to user prompts has grown rapidly. As these systems evolve, there's been a growing need for a standard way to enable communication between agents and give them richer context to handle more complex tasks through natural language.

That's where the Model Context Protocol (MCP) comes in. Announced by Anthropic on November 25, 2024, MCP is an open-source protocol that standardizes how large language models (LLMs) interact with external tools and data sources.

In this post, we'll walk through building a command-line chatbot that manages academic papers using MCP. We'll learn how to:

Create an MCP server that exposes tools, resources, and prompt templates
Connect to third-party MCP servers
Build a custom MCP client to interact with those servers

By the end, our chatbot will be able to:

Search for academic papers on arXiv
Organize articles by research topic
Access metadata about saved papers
Pull information from any URL using a third-party MCP server
Generate documents using content retrieved from external sources

Here's how we'll break it down:

Local environment setup
What is MCP?
Building the MCP server
Using third-party MCP servers
Creating the host and client
Key features
Final thoughts
Resources

Let's get started! 🚀

Important note: Only the most relevant function signatures and docstrings are shown in this post. You can find the full implementation in this GitHub repository.

🛠️ Local environment

🐍 Python 3.12.8

⬢ Node v22.13.0

Run the following command to install the required Python packages:

pip3 install dotenv             # Loads ANTHROPIC_API_KEY from a .env file
pip3 install anthropic          # Client for Anthropic's Sonnet model
pip3 install mcp                # Core package for MCP servers and clients
pip3 install arxiv              # Used for querying arXiv articles
pip3 install mcp_server_fetch   # Third-party party MPC server

Pro tip: Use Python virtual environments and Node Version Manager (NVM) for cleaner dependency management.

🤔 What Is MCP?

Let's briefly cover what MCP is and how it works. For more details, check out the Resources section at the end.

MCP (Model Context Protocol) is an open protocol designed to streamline the way LLMs connect to tools and data sources. It follows a client-server architecture where each MCP client maintains a direct, one-to-one connection with each server it talks to.

Here's the breakdown:

Host: The application that embeds the language model (e.g., Claude Desktop or a code editor)
Client: A component inside the host that manages the connection to one MCP server by invoking tools, querying for resources and interpolating prompts.
Server: Provides context to the LLM via three primitives:

Tools – functions that can be invoked by the client. These tools allow for retrieving, searching, sending messages, updating database records are usually meant for data that might require something like a Post request or some kind of modification.
Resources – similar to a Get request. They are read-only data or context that's exposed by the server.similar to a Get request. They are read-only data or context that's exposed by the server.
Prompt templates – predefined templates that live on the server to remove the burden of prompt engineering from users by providing optimized prompts for common tasks.

How They Communicate?

The communication between client and server follows a specific lifecycle. First, there's an initialization process where the client sends a request, the server responds, and sends a confirmation notification. After initialization, both parties can freely exchange messages and notifications.

To enable this communication, MCP provides different transport mechanisms that handle the actual data flow between client and server:

Standard I/O for local servers - The client launches the server as a subprocess and communicates through stdin/stdout
HTTP with Server-Sent Events for remote servers - Maintains stateful connections between requests
Streamable HTTP (recommended) - A newer transport that flexibly supports both stateful and stateless connections

As this article focuses on understanding the basics of MCP, we'll use the stdio transport mechanism in a local environment.

Let's code 🚀

🖥️ Building the MCP Server

We'll use the FastMCP framework to build our own MCP server in research_server.py. FastMCP offers handy decorators to expose:

Tools via @mcp.tool()
Resources via @mcp.resource()
Prompt templates via @mcp.prompt()

1. Define the Server

from mcp.server.fastmcp import FastMCP

# Create the server
mcp = FastMCP("research")

Tools

Once the server is defined, we can start defining primitives. Let's start by defining two tools using @mcp.tool():

1- search_papers: Searches arXiv for articles on a specific topic:

@mcp.tool()
def search_papers(topic: str, max_results: int = 5) -> List[str]:
    """
    Search for papers on arXiv based on a topic and store their information.

    Args:
        topic: The topic to search for
        max_results: Maximum number of results to retrieve (default: 5)

    Returns:
        List of paper IDs found in the search
    """
    # Implementation details...

2- extract_info: Retrieves metadata for a specific paper;

@mcp.tool()
def extract_info(paper_id: str) -> str:
    """
    Search for information about a specific paper across all topic directories.

    Args:
        paper_id: The ID of the paper to look for

    Returns:
        JSON string with paper information if found, error message if not found
    """
    # Implementation details...

Resources

Next, we need to define how users can access all available research topics and retrieve articles for a specific topic. For this, we expose two resources using @mcp.resource(...):

1- A list of available research topics:

@mcp.resource("papers://folders")
def get_available_folders() -> str:
    """
    List all available topic folders in the papers directory.

    This resource provides a simple list of all available topic folders.
    """
    # Implementation details...

2- Articles stored under a given topic:

@mcp.resource("papers://{topic}")
def get_topic_papers(topic: str) -> str:
    """
    Get detailed information about papers on a specific topic.

    Args:
        topic: The research topic to retrieve papers for
    """
    # Implementation details...

Prompt templates

To reduce the need for manual prompt engineering, we can define prompt templates. These are listed by the MCP client, interpolated with user input, and then sent to the LLM.

To expose prompt templates, we must use @mcp.prompt(). In our case, we will create just one prompt template, which will perform the search for articles in arXiv:

@mcp.prompt()
def generate_search_prompt(topic: str, num_papers: int = 5) -> str:
    """Generate a prompt for Claude to find and discuss academic papers on a specific topic."""
    # Implementation details...

Testing our server

There's a great way to test our server using the Model Context Protocol Inspector, a handy tool built to explore MCP servers. As it's written in Type Script, we need to use npx:

The fisrt step is to install the last npm version:

npm install -g npm@latest

Then, run in the command line:

npx @modelcontextprotocol/inspector python3 research_server.py

Once the server is up and running, the URI is displayed in the console. By clicking it, the browser will show you this:

Click the "Connect" button to start interacting with the server. You can then test the available tools, resources and prompts through the inspector interface.

🔌 Using Third-Party MCP Servers

Now that we've built our own server, we can also use third-party MCP servers.

In this case, we'll use the following two servers provided by Anthropic:

Filesystem MCP Server -> handles file operations
Fetch MCP Server -> fetches information from URIs

Both MCP servers have been developed by Anthropic and are used from their official GitHub.

For our host to create the MCP clients and connect to these servers, we need to create the server_config.json file.

In this file, we'll define the three servers we want to connect to. For each one, we must specify how they should be run. This information can be found in each server's documentation. In our case, our .json would look like this:

{
    "mcpServers": {

        //Third-Party MCP server developed in TypeScript
        "filesystem": {
            "command": "npx",
            "args": [
                "-y",
                "@modelcontextprotocol/server-filesystem",
                "."
            ]
        },

        //Our MCP server developed in Python
        "research": {
            "command": "python3",
            "args": ["research_server.py"]
        },

        //Third-Party MCP server developed in Python
        "fetch": {
            "command": "python3",
            "args": ["-m", "mcp_server_fetch"]
        }
    }
}

👨‍💻 Creating the Host and Clients

Our chatbot will serve as the host and create MCP clients to connect with each server. We'll define a class called MCP_ChatBot, which will:

Load and connect to the configured MCP servers
Establish individual server connections
Manage available tools and sessions
Handle user queries
Gracefully shut down connections

The connect_to_servers() method will load the server configurations and establish connections to all servers, while connect_to_server() handles connecting to individual servers and registering their available tools.

Additionally, it will contain several methods:

chat_loop(...) manages the command-line UI and allows users to enter their prompts
process_query(...) processes prompts using Anthropic's Sonnet as the LLM
cleanup(...) closes all client connections to servers when the user ends the chat

Our host mcp_chatbot.py therefore has the following structure:

class MCP_ChatBot:
    def __init__(self):
        self.sessions: List[ClientSession] = [] # Managing the MCP's servers sessions
        self.exit_stack = AsyncExitStack()
        self.anthropic = Anthropic()
        self.available_tools: List[ToolDefinition] = [] # All the availables tools
        self.tool_to_session: Dict[str, ClientSession] = {} # Relations between tools and servers
    ...

    async def connect_to_server(self, server_name: str, server_config: dict):
        try:
            server_params = StdioServerParameters(**server_config)
            stdio_transport = await self.exit_stack.enter_async_context(
                stdio_client(server_params)
            )
            read, write = stdio_transport
            session = await self.exit_stack.enter_async_context(
                ClientSession(read, write)
            )
            await session.initialize()
            self.sessions.append(session)

            # List available tools for this session
            response = await session.list_tools()
            tools = response.tools
            print(f"\nConnected to {server_name} with tools:", [t.name for t in tools])

            for tool in tools:
                self.tool_to_session[tool.name] = session
                self.available_tools.append({
                    "name": tool.name,
                    "description": tool.description,
                    "input_schema": tool.inputSchema
                })
        except Exception as e:
            print(f"Failed to connect to {server_name}: {e}")

    async async def connect_to_servers(self):
        """Connect to all configured MCP servers."""
        try:
            # Here we are loading the servers out host must connect with
            with open("server_config.json", "r") as file:
                data = json.load(file)

            servers = data.get("mcpServers", {})

            for server_name, server_config in servers.items():
                # This will create a client/session for each server
                await self.connect_to_server(server_name, server_config)
        except Exception as e:
            print(f"Error loading server configuration: {e}")
            raise

    async def process_query(self, query):...

    async def chat_loop(self):...

    async def cleanup(self):...

    ...

async def main():
    chatbot = MCP_ChatBot()
    try:
        await chatbot.connect_to_servers()
        await chatbot.chat_loop()
    finally:
        await chatbot.cleanup()

if __name__ == "__main__":
    asyncio.run(main())

This way, our host (mcp_chatbot.py):

Creates as many clients as servers are defined in the servers_config.json file
Stores each of these connections
Lists the available tools in each server and stores them
Processes user queries
Manages the chat interface
Cleans up connections when the user ends execution

⚙️ Running the Chatbot

To launch the chatbot, simply run:

python3 mcp_chatbot.py

You should see the chatbot connect to all three servers.

Then, try a query like: search for two articles about deep learning and provide your summary of both

Behind the scenes, it will:

Call tool search_papers with args {'topic': 'deep learning', 'max_results': 2}
Call tool extract_info with args {'paper_id': '1805.08355v1'} and tool extract_info with args {'paper_id': '1806.01756v1'}

The .json created by running this prompt looks like:

{
  "1805.08355v1": {
    "title": "Opening the black box of deep learning",
    "authors": ["Dian Lei", "Xiaoxiao Chen", "Jianfei Zhao"],
    "summary": "The great success of deep learning ...",
    "pdf_url": "http://arxiv.org/pdf/1805.08355v1",
    "published": "2018-05-22"
  },
  "1806.01756v1": {
    "title": "Concept-Oriented Deep Learning",
    "authors": ["Daniel T Chang"],
    "summary": "Concepts are the foundation of human deep learning...",
    "pdf_url": "http://arxiv.org/pdf/1806.01756v1",
    "published": "2018-06-05"
  }
}

The second try fetches information from a particular URI: look into https://deeplearning.ai, extract one relevant concept and research articles about it

This time, the first step is:

Calling tool fetch with args {'url': 'https://deeplearning.ai'}

Then, as we haven't specifyed the number of articles to look for, it uses the default value of 5:

Calling tool search_papers with args {'topic': 'Retrieval-Augmented Generation RAG LLM', 'max_results': 5}

Finally, the five calls for extracting information are made:

Calling tool extract_info with args {'paper_id': '2409.01666v1'}
Calling tool extract_info with args {'paper_id': '2501.00353v1'}
Calling tool extract_info with args {'paper_id': '2407.21059v1'}
Calling tool extract_info with args {'paper_id': '2501.05249v1'}
Calling tool extract_info with args {'paper_id': '2504.08758v1'}

The final response of this try would look like this:

Based on my research of DeepLearning.ai...

# Retrieval-Augmented Generation (RAG) in Modern AI

## What is RAG?
Retrieval-Augmented Generation (RAG)...

## Why RAG Matters
According to recent research...

## Applications of RAG
RAG has proven particularly valuable in:...

## Future Directions
The modular approach to RAG systems suggests...

The .json created by running this prompt looks similar to the previous one but contains five documents:

{
  "2409.01666v1": {
    "title": "In Defense of RAG...",
    "authors": ["Tan Yu", "Anbang Xu", "Rama Akkiraju"],
    "summary": "...",
    "pdf_url": "http://arxiv.org/pdf/2409.01666v1",
    "published": "2024-09-03"
  },
  "2501.00353v1": {...},
  "2407.21059v1": {...},
  "2501.05249v1": {...},
  "2504.08758v1": {...}
}

For finishig the chatbot, just type quit.

Awesome! We managed to connect and use many MCP servers from our host, creating MCP clients and maintaining 1:1 sessions between clients and servers.

✅ Key Features

Organized by Topic: Easily browse articles by research theme
Persistent Storage: Articles metadata are saved locally as JSON
Interactive Chat UI: Simple and effective CLI-based interface
Smart Summaries: Summarizations powered by Claude
Tool & Resource Management: Clean separation between read-only data and actions

💬 Final thoughts

This post demonstrates how MCP can be used to build AI applications that interact with external data sources. By standardizing how AI applications connect with tools and data, MCP makes it easier to build and maintain complex AI systems.

📚 Resources

Full code of this post 👉 ezequiroga/mcp-bases

MCP servers 👉 Model Context Protocol servers

Official MCP documentation 👉 MCP Documentation

Anthropic article introducing MCP 👉 Introducing the Model Context Protocol

The origins of MCP, explained by Mike Krieger 👉 Anthropic CPO Mike Krieger: Building AI Products From the Bottom Up

MCP short course 👉 MCP: Build Rich-Context AI Apps with Anthropic