<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sunil Kumar Dash</title>
    <description>The latest articles on Forem by Sunil Kumar Dash (@sunilkumrdash).</description>
    <link>https://forem.com/sunilkumrdash</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F898740%2Fbe1827e3-0e8a-40dc-8b74-90c2913aa39e.jpg</url>
      <title>Forem: Sunil Kumar Dash</title>
      <link>https://forem.com/sunilkumrdash</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sunilkumrdash"/>
    <language>en</language>
    <item>
      <title>I rebuilt OpenClaw from scratch without the security flaws</title>
      <dc:creator>Sunil Kumar Dash</dc:creator>
      <pubDate>Mon, 16 Feb 2026 10:33:21 +0000</pubDate>
      <link>https://forem.com/composiodev/i-rebuilt-openclaw-from-scratch-without-the-security-flaws-2mle</link>
      <guid>https://forem.com/composiodev/i-rebuilt-openclaw-from-scratch-without-the-security-flaws-2mle</guid>
      <description>&lt;p&gt;OpenClaw launched with great fanfare, and I was curious whether you could truly "vibe code" the entire project on your own, especially since the original creator built it with Codex. We're in the era of "build it yourself instead of setting it up" and I wanted to take that philosophy a step further by recreating it from scratch.&lt;/p&gt;

&lt;p&gt;This is the story of how I rebuilt OpenClaw using modern coding agent SDKs, tackled integration challenges across multiple messaging platforms, and deployed it securely in production,all while avoiding the security pitfalls of the original.&lt;/p&gt;

&lt;p&gt;Checkout the repository here: &lt;a href="https://github.com/ComposioHQ/secure-openclaw" rel="noopener noreferrer"&gt;Secure OpenClaw&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Research &amp;amp; Planning
&lt;/h2&gt;

&lt;p&gt;The first thing I did was use GPT Pro mode to research the entire codebase and explain all the features and tools used. The Pro model excels at these broad tasks that require processing large amounts of information in a single shot. It gave me a detailed product spec on how OpenClaw works and what it uses for each functionality.&lt;/p&gt;

&lt;p&gt;I decided to use coding agent SDKs because they represent the first real use cases people have had with LLMs beyond writing. Claude provides the Claude Agent SDK, and OpenCode provides a similar SDK. These SDKs natively provide access to tools like read, write, bash, edit, and support for skills and MCP (Model Context Protocol).&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;I wanted to set up two modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Terminal mode&lt;/strong&gt;: For direct interaction and development&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gateway mode&lt;/strong&gt;: For 24/7 operation, listening to WhatsApp, Telegram, Signal, iMessage, and other messaging apps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The gateway architecture is what makes OpenClaw powerful,it runs continuously in the background, monitoring multiple communication channels and responding autonomously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Messaging Platform Integrations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  WhatsApp
&lt;/h3&gt;

&lt;p&gt;WhatsApp integration uses a library called &lt;a href="https://github.com/WhiskeySockets/Baileys" rel="noopener noreferrer"&gt;Baileys&lt;/a&gt; to establish a WhatsApp Web connection. Here's how it works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Baileys connects to WhatsApp Web's WebSocket&lt;/li&gt;
&lt;li&gt;When a message arrives, WhatsApp's server pushes it via WebSocket&lt;/li&gt;
&lt;li&gt;Baileys emits a &lt;code&gt;messages.upsert&lt;/code&gt; event with type &lt;code&gt;'notify'&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The agent can then process and respond to the message&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One challenge I encountered was creating the allowlist for WhatsApp numbers. WhatsApp doesn't use phone numbers directly in the WebSocket connection,it uses link IDs. Messages arrive with these IDs, and I needed bidirectional conversion between phone numbers and link IDs. Claude Code initially struggled with building the right mapping, but after some iteration, we got it working correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Telegram
&lt;/h3&gt;

&lt;p&gt;Telegram was much more straightforward thanks to its Bot API. The implementation uses &lt;strong&gt;long polling&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Periodically calls Telegram's &lt;code&gt;getUpdates&lt;/code&gt; API&lt;/li&gt;
&lt;li&gt;Waits up to 30 seconds for new messages&lt;/li&gt;
&lt;li&gt;When a message arrives, it immediately returns and calls &lt;code&gt;getUpdates&lt;/code&gt; again&lt;/li&gt;
&lt;li&gt;Emits a &lt;code&gt;message&lt;/code&gt; event for each new message&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Bot API is well-documented and significantly easier to set up than WhatsApp.&lt;/p&gt;

&lt;h3&gt;
  
  
  iMessage
&lt;/h3&gt;

&lt;p&gt;iMessage integration was a fascinating unlock. It uses a library called &lt;a href="https://github.com/steipete/imessage-exporter" rel="noopener noreferrer"&gt;imsg&lt;/a&gt;, built by Peter Steinberger himself. The approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reads the SQLite database where all iMessages are stored&lt;/li&gt;
&lt;li&gt;Monitors the database using &lt;strong&gt;FSEvents&lt;/strong&gt;, a kernel-level file system monitoring API on macOS&lt;/li&gt;
&lt;li&gt;Detects new messages in real-time as they're written to the database&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives the agent access to iMessage without requiring any official API.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tools &amp;amp; Integrations
&lt;/h2&gt;

&lt;p&gt;As they say, an agent is nothing without the tools it uses. I equipped the agent with:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read, Write, Edit (file operations)&lt;/li&gt;
&lt;li&gt;Bash (command execution)&lt;/li&gt;
&lt;li&gt;Glob, Grep (file searching)&lt;/li&gt;
&lt;li&gt;TodoWrite (task management)&lt;/li&gt;
&lt;li&gt;Skill (access to predefined workflows)&lt;/li&gt;
&lt;li&gt;AskUserQuestion (user interaction)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Custom Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cron tools for scheduled tasks&lt;/li&gt;
&lt;li&gt;Gateway tools for WhatsApp and Telegram communication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Third-Party Integrations:&lt;/strong&gt; For secure integration with services like Slack, GitHub, Teams, and more, I used &lt;a href="https://composio.dev/" rel="noopener noreferrer"&gt;Composio&lt;/a&gt;. Composio lets you securely connect and use these tools in a sandbox environment while handling all the credentials and authentication.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment Challenges
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Docker Setup
&lt;/h3&gt;

&lt;p&gt;I created a Docker setup designed to run in the background on a DigitalOcean droplet. The goal was to make it quickly deployable without too many setup hassles. However, I ran into several issues:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 1: OOM (Out of Memory) Errors&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Running on a $6/month instance with 2GB RAM, the container kept crashing. The issue? It tried installing Claude Code and OpenAI's SDK together simultaneously, exhausting available memory. Once I identified this, I staggered the installations and the problem was resolved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 2: Permission Mode Conflicts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The gateway uses &lt;code&gt;permissionMode: 'bypassPermissions'&lt;/code&gt; so the agent can run autonomously without human approval for each tool call. However, Claude Code refuses to enable this when running as root,a built-in security feature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I had to restructure the entire Dockerfile to use a non-root user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Create non-root user (Claude Code refuses bypassPermissions as root)
RUN useradd -m -s /bin/bash claw &amp;amp;&amp;amp; chown -R claw:claw /app
USER claw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This cascaded into fixing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All file paths (&lt;code&gt;/root/&lt;/code&gt; → &lt;code&gt;/home/claw/&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Docker Compose volume mounts&lt;/li&gt;
&lt;li&gt;CLI installation directories&lt;/li&gt;
&lt;li&gt;Workspace permissions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The refactoring took several hours but resulted in a much more secure deployment that adheres to best practices.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Modern coding agents are incredibly capable&lt;/strong&gt; - With proper tooling and context, they can rebuild complex systems from scratch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security by design matters&lt;/strong&gt; - The forced non-root user setup, while initially frustrating, led to a more secure architecture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration complexity varies wildly&lt;/strong&gt; - Telegram took 30 minutes, WhatsApp took hours, iMessage required creative solutions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource constraints force better architecture&lt;/strong&gt; - The 2GB RAM limitation pushed me to optimize installation and runtime behavior&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation is everything&lt;/strong&gt; - Services with good APIs (like Telegram) are significantly easier to integrate than those requiring reverse engineering&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The rebuilt OpenClaw is now running in production, handling messages across multiple platforms without the security issues that plagued the original. Future improvements include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adding more messaging platforms (Discord, Slack DMs)&lt;/li&gt;
&lt;li&gt;Implementing better error handling and retry logic&lt;/li&gt;
&lt;li&gt;Creating a web dashboard for monitoring and configuration&lt;/li&gt;
&lt;li&gt;Optimizing memory usage to run on even smaller instances&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Building this from scratch was an excellent exercise in understanding how modern AI agents work in production. The combination of LLM capabilities, proper tooling, and careful architecture makes it possible to create powerful autonomous systems that were previously extremely difficult to build.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>javascript</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How to better your Claude CoWork experience with MCPs</title>
      <dc:creator>Sunil Kumar Dash</dc:creator>
      <pubDate>Mon, 19 Jan 2026 13:08:02 +0000</pubDate>
      <link>https://forem.com/composiodev/how-to-better-your-claude-cowork-experience-with-mcps-3hfp</link>
      <guid>https://forem.com/composiodev/how-to-better-your-claude-cowork-experience-with-mcps-3hfp</guid>
      <description>&lt;p&gt;Right when everyone was busy talking about how good Claude Code is, Anthropic launched Claude CoWork, basically Claude Code with a much less intimidating interface for automating fake email jobs. It can access your local file system, connectors, MCPs, and do almost everything that can be executed through the shell.&lt;/p&gt;

&lt;p&gt;Claude CoWork is currently available as a research preview in the Claude Desktop app as a separate tab for Max subscribers ($100 or $200 per month plans) on macOS, with Windows support planned for the future. &lt;/p&gt;

&lt;p&gt;The tool works by giving users access to a folder on their computer, where it can read, edit, or create files on their behalf. It works inside a local containerised environment by mounting your local folders. Which means you can trust that it won’t access folders that you haven’t granted permission to.&lt;/p&gt;

&lt;p&gt;There’s a lot to talk about CoWork, but perhaps in a separate blog post. This talks about using connectors and MCPs to do more than organising files.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you don't want to waste time, just use &lt;a href="https://rube.app" rel="noopener noreferrer"&gt;rube.app&lt;/a&gt; inside Claude Code. &lt;br&gt;
You will get&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instant access to 900+ SaaS Apps (Gmail, GitHub, BitBucket, etc)&lt;/li&gt;
&lt;li&gt;Zero Oauth and Key management hassle&lt;/li&gt;
&lt;li&gt;Dynamic tool loading, hence reduced token usage and better execution&lt;/li&gt;
&lt;li&gt;create reusable workflows and access them as tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc85w46fwu1cupjmgdjm2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc85w46fwu1cupjmgdjm2.png" alt="Claude Cowork with 900+ MCPs"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://rube.app" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Try Rube Now for FREE&lt;/a&gt;
&lt;/p&gt;




&lt;h2&gt;
  
  
  Working with MCP Connectors
&lt;/h2&gt;

&lt;p&gt;Claude AI Connectors are direct integrations that let Claude access your actual work tools and data. Launched in July 2025, these connectors transform Claude from an AI that knows a lot about the world into an AI that knows a lot about &lt;em&gt;your&lt;/em&gt; world.&lt;/p&gt;

&lt;p&gt;Claude comes with pre-built integrations, including Gmail, Google Drive, GitHub, and Google Calendar. Apart from these, there are tons of Local and Remote MCP servers from HubSpot, Snowflake, Figma, and Context7. &lt;/p&gt;

&lt;h3&gt;
  
  
  Using default Integrations
&lt;/h3&gt;

&lt;p&gt;For default integrations, all you need to do is just connect your accounts and start working with them. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to &lt;strong&gt;Settings &amp;gt; Connectors&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Find the integration you want to enable&lt;/li&gt;
&lt;li&gt;Click the "Connect" button&lt;/li&gt;
&lt;li&gt;Follow the authentication flow&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Pro, Max, Team, and Enterprise users can add these connectors to Claude or Claude Desktop.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9fdew1sngbcd7ygkrdhd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9fdew1sngbcd7ygkrdhd.png" alt="Image 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Using Anthropic Marketplace Connectors
&lt;/h3&gt;

&lt;p&gt;Anthropic has an MCP marketplace where you can find Anthropic-reviewed tools, both local and remote-hosted connectors.&lt;/p&gt;

&lt;p&gt;**For Desktop/Local MCPs: **Click Desktop → Search Your MCP → Click Install&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fois14qjojaysejlrlpmm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fois14qjojaysejlrlpmm.png" alt="Image 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;For remote MCPs, *&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Navigate to Browse Connectors&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On the Web tab, search your MCPs &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdzjshteys5mbmh5ri08.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdzjshteys5mbmh5ri08.png" alt="Image 3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Provide your server URL if needed, and you’re done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom MCP Server&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the most interesting part. You can use whatever MCP servers you prefer.&lt;/p&gt;

&lt;p&gt;Click on Add a Custom Connector → Provide MCP name and Server URL → (Optional) Oauth credentials&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2kwp7pmw1xokw2vzzjbo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2kwp7pmw1xokw2vzzjbo.png" alt="Image 4"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  But….You shouldn’t be using MCP servers
&lt;/h2&gt;

&lt;p&gt;MCP servers are definitely a force multiplier, making it easy for LLMs to access data. However, they have physical limitations.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The MCPs are token hungry
&lt;/h3&gt;

&lt;p&gt;Each MCP tool has a schema definition, what it does, the parameters, and sometimes examples. The more detailed the tool definitions, the more reliable the execution; however, LLMs have a limited context window (200k). And it’s well known that LLMs are more effective when they are not bloated. The more MCPs there are, the less space there is for actual execution.&lt;/p&gt;

&lt;p&gt;For example, the GitHub and Linear official MCPs have 40 and 27 tools, respectively, and they consume 17.1K tokens (8.5%). &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb7u1xwycat03slp4utyj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb7u1xwycat03slp4utyj.png" alt="Image 5"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Tool definitions are always loaded, even when unused
&lt;/h3&gt;

&lt;p&gt;Most MCP clients eagerly load all available tools into the model context. That means tools the model will never call still consume tokens on every request.&lt;/p&gt;

&lt;p&gt;If your server exposes 20 endpoints but a given task only needs 2, the model still incurs the cost of all 20. Over time, this pushes teams to artificially split MCP servers, not for architectural clarity, but to work around context limits.&lt;/p&gt;

&lt;p&gt;This also discourages experimentation. Engineers hesitate to add new tools because every addition degrades all existing interactions.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Large tool outputs quietly destroy context
&lt;/h3&gt;

&lt;p&gt;The biggest failures are less about schemas. They are caused by results.&lt;/p&gt;

&lt;p&gt;Logs, database rows, file lists, search results, stack traces, and JSON blobs all flow straight back into the model. Even a single careless response can erase half the conversation history.&lt;/p&gt;

&lt;p&gt;This is not ideal at all and can jeopardize LLMs.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Tool selection degrades as tool count grows
&lt;/h3&gt;

&lt;p&gt;As the number of MCP tools increases, tool selection accuracy drops.&lt;/p&gt;

&lt;p&gt;Models begin to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Call near matches instead of the correct tool&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Overuse generic tools&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Avoid tools altogether and hallucinate answers&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This happens even if all tools are well described. The attention budget simply is not infinite. Past a certain point, the model stops fully reading tool definitions.&lt;/p&gt;

&lt;p&gt;You can observe this directly by adding more tools and watching call precision decline.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to fix this?
&lt;/h2&gt;

&lt;p&gt;By implementing a few architectural improvements&lt;/p&gt;

&lt;h3&gt;
  
  
  1. On-demand tool loading
&lt;/h3&gt;

&lt;p&gt;Instead of loading every tool definition into the context upfront, only load the tools you actually need for the current task.&lt;/p&gt;

&lt;p&gt;This is the simplest way to cut token usage, because tool schemas are the “always-on” cost. If you can turn that into a “pay only when used” cost, you immediately get more room for reasoning and better reliability.&lt;/p&gt;

&lt;p&gt;We’ve implemented this in the Rube, a universal MCP server, that dynamically loads tools based on the task contexts&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A Planner tool that plans in detail about a task, and a Search tool that finds and retrieves required tools.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When the model needs something, it asks for the specific tool definition.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Only then do you inject that tool’s schema into the context.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This also fixes the experimentation problem. You can add more tools without degrading every session, since most sessions won't load them.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Indexing tools for better discoverability
&lt;/h3&gt;

&lt;p&gt;Tool selection gets worse as the tool count grows, even if every tool is well described.&lt;/p&gt;

&lt;p&gt;So don’t rely on the model to “scan” a long list of tools. Give it a way to search tools like an index.&lt;/p&gt;

&lt;p&gt;The pattern is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Maintain a small searchable catalogue of tools. Effectively, in a vector database with hybrid search (full text match + vector embeddings of tool definitions)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Each entry has: tool name, one-line purpose, key parameters, and a few example queries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Let the model search the catalogue with natural language.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Return the top 3-5 matches, then load only those schemas.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also makes tool naming less painful. Even if a tool name is slightly off, the index can still match on description.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Handling Large Outputs outside the LLM’s context
&lt;/h3&gt;

&lt;p&gt;This is the biggest lever.&lt;/p&gt;

&lt;p&gt;Most MCP failures occur when tools return a large payload, and you paste it straight back into the model. Once you do that, the session starts forgetting earlier goals and acting strangely.&lt;/p&gt;

&lt;p&gt;The fix is to stop treating the model like your output buffer.&lt;/p&gt;

&lt;p&gt;Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Store large outputs outside the prompt (local file, object store, database, even a temp cache).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Return a small summary plus a handle (file path, ID, cursor, pointer).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Models are extremely good at file operations, and storing large blobs in file storage and letting the model retrieve only what’s needed can go a long way.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model should never be forced to read 200 KB of JSON just because the tool had it available.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Programmatic Tool Calling or CodeAct
&lt;/h3&gt;

&lt;p&gt;LLMs are extremely performant at writing code. So, instead of giving LLMs direct MCP tools, it's better to give a workbench where they can write glue code for MCP tool chaining and execute it to get outputs.&lt;/p&gt;

&lt;p&gt;Instead of LLMs calling a tool, waiting, reading the result, then deciding the next tool call (and repeating that cycle over and over), LLMs** write a small chunk of code inside a code execution container that calls your tools as functions**. That code can loop, branch, filter, aggregate, and stop early without requiring a new model round-trip for every step.&lt;/p&gt;

&lt;p&gt;The reason this matters for MCPs is context.&lt;/p&gt;

&lt;p&gt;With traditional tool calling, every intermediate result is included in the chat and consumes token space. With programmatic tool calling, &lt;strong&gt;the intermediate tool results are processed inside the code execution environment and do not enter Claude’s context&lt;/strong&gt;. Claude only sees the final output of the code, which is usually a much smaller summary.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling" rel="noopener noreferrer"&gt;Anthropic’s guidance&lt;/a&gt; is that it pays off most when you have any of these patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Large datasets where you only need aggregates or summaries&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multi-step workflows with 3 or more dependent tool calls&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Filtering, sorting, or transforming tool results before Claude sees them&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parallel operations across many items (for example, checking 50 things)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tasks where intermediate data should not influence reasoning&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is some overhead because you are adding code execution to the loop, so it’s less useful for a single quick lookup.&lt;/p&gt;




&lt;h2&gt;
  
  
  We’ve already solved it
&lt;/h2&gt;

&lt;p&gt;Before this became mainstream knowledge (thanks to Anthropic’s&lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt; Blog post&lt;/a&gt;), we had already implemented the pattern at scale with &lt;a href="https://rube.app/" rel="noopener noreferrer"&gt;Rube&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It’s an MCP server with meta tools that implements the above design patterns and more. This is a wrapper over our core tool infrastructure. You can access all our &lt;a href="https://composio.dev/toolkits" rel="noopener noreferrer"&gt;877 SaaS toolkits&lt;/a&gt; without the headaches of implementing authentication.&lt;/p&gt;

&lt;p&gt;Here’s what we’ve got in Rube MCP.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery &amp;amp; Connection Tools
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RUBE_SEARCH_TOOLS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Discovers relevant tools and generates execution plans for tasks. Always call this first when starting a workflow. Returns tools, schemas, connection status, and recommended steps.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RUBE_GET_TOOL_SCHEMAS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Retrieves complete input parameter schemas for tools. Use when SEARCH_TOOLS returns &lt;code&gt;schemaRef&lt;/code&gt; instead of full schema.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RUBE_MANAGE_CONNECTIONS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Creates or manages connections to user's apps. Returns auth links for OAuth/API key setup. Never execute tools without an active connection.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Execution Tools
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RUBE_MULTI_EXECUTE_TOOL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fast parallel executor for up to 50 tools across apps. Primary way to run discovered tools. Includes memory storage for persistent facts across executions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RUBE_REMOTE_WORKBENCH&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Executes Python code in a remote Jupyter sandbox. Use for processing large data files, bulk operations, or scripting complex tool chains. Has 4-minute timeout.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RUBE_REMOTE_BASH_TOOL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Executes bash commands in a remote sandbox. Useful for file operations and processing JSON with tools like &lt;code&gt;jq&lt;/code&gt;, &lt;code&gt;awk&lt;/code&gt;, &lt;code&gt;sed&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Recipe Tools (Reusable Workflows)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RUBE_CREATE_UPDATE_RECIPE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Converts completed workflows into reusable notebooks/recipes with defined inputs, outputs, and executable code.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RUBE_EXECUTE_RECIPE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runs an existing recipe with provided input parameters.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RUBE_FIND_RECIPE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Searches for recipes using natural language (e.g., "GitHub PRs to Slack"). Returns matching recipes with IDs for execution.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RUBE_GET_RECIPE_DETAILS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Retrieves full details of a recipe by ID, including code, schema, and defaults.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RUBE_MANAGE_RECIPE_SCHEDULE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Creates, updates, pauses, or deletes recurring schedules for recipes using cron expressions.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Typical Workflow
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;RUBE_SEARCH_TOOLS&lt;/strong&gt; → Find tools for your task&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RUBE_MANAGE_CONNECTIONS&lt;/strong&gt; → Ensure apps are connected&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RUBE_MULTI_EXECUTE_TOOL&lt;/strong&gt; → Execute the tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RUBE_REMOTE_WORKBENCH&lt;/strong&gt; → Process large results if needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RUBE_CREATE_UPDATE_RECIPE&lt;/strong&gt; → Save as reusable recipe (optional)&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  How to use Rube with Claude CoWork
&lt;/h2&gt;

&lt;p&gt;The process is essentially the same as adding any Remote MCP servers.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Head to &lt;a href="http://rube.app/" rel="noopener noreferrer"&gt;Rube.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Click on Use Rube&lt;/li&gt;
&lt;li&gt;Copy the code &lt;code&gt;https://rube.app/mcp&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ceifdgj0kyd8hm0hgdi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ceifdgj0kyd8hm0hgdi.png" alt="Image 6"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open your Claude App and go to the connectors&lt;/li&gt;
&lt;li&gt;Paste the MCP URL&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbia0visuxua2dyvd9gu6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbia0visuxua2dyvd9gu6.png" alt="Image 7"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;And…You’re done.&lt;/li&gt;
&lt;li&gt;Ask whatever you want. You’ll be prompted to authenticate with the apps you need. Then leave it upto Claude.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Some cool examples that I use every day
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Analyse blog post performance from Google Search Console and create Notion files
&lt;/h3&gt;

&lt;h3&gt;
  
  
  2. Converting the Google Sheet to Notion
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://youtu.be/PsPjcFp4-iY" rel="noopener noreferrer"&gt;https://youtu.be/PsPjcFp4-iY&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  End Note
&lt;/h2&gt;

&lt;p&gt;Claude CoWork is really great. If you want to take yourself to the next level, add all the apps you use. Rube is the one you should be using. &lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>ai</category>
      <category>beginners</category>
    </item>
    <item>
      <title>From Auth to Action: The Complete Guide to Secure &amp; Scalable AI Agent Infrastructure (2026)</title>
      <dc:creator>Sunil Kumar Dash</dc:creator>
      <pubDate>Mon, 10 Nov 2025 10:50:57 +0000</pubDate>
      <link>https://forem.com/composiodev/from-auth-to-action-the-complete-guide-to-secure-scalable-ai-agent-infrastructure-2026-2ieb</link>
      <guid>https://forem.com/composiodev/from-auth-to-action-the-complete-guide-to-secure-scalable-ai-agent-infrastructure-2026-2ieb</guid>
      <description>&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auth is Not Enough:&lt;/strong&gt; Getting an OAuth token (Pillar 1) is just the first step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production Needs Guardrails:&lt;/strong&gt; You must build Granular Control (Pillar 2) with patterns like Brokered Credentials to prevent security risks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability Requires an Engine:&lt;/strong&gt; A reliable action layer (Pillar 3) with a Unified API and managed retries is essential to move from prototype to production.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Understanding the "Authentication Wall" for AI Agents
&lt;/h2&gt;

&lt;p&gt;You've built a powerful AI agent. Using a framework like LangChain or CrewAI, you've designed a sophisticated workflow that can reason, plan, and execute tasks. There's just one problem: Your agent is trapped in a sandbox, unable to interact with the real world. To be useful, it needs access to user-specific tools like Google Calendar, Salesforce, or Jira. This is where you hit the "Authentication Wall".&lt;/p&gt;

&lt;p&gt;Suddenly, you're wrestling with the complexities of AI agent authentication. You're managing multi-step OAuth 2.0 flows, securely storing refresh tokens, and handling credential management for dozens of different APIs. It's a significant engineering challenge, and it's a common reason why promising agent prototypes never make it to production.&lt;/p&gt;

&lt;p&gt;But solving authentication isn't the real goal. It's just the gateway to a much larger set of problems. Getting an OAuth token is the first step. The real challenge is building a secure, production-ready, and governable system for an agent to act on a user's behalf. This is a problem of secure AI agent workflow management, not just auth.&lt;/p&gt;

&lt;p&gt;A production-ready AI agent infrastructure requires three essential pillars: 1. Secure Authentication, 2. Granular Control, and 3. Reliable Action. This guide walks through the architecture of all three, helping you move beyond the Authentication Wall and build agents that are truly ready for production.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pillar 1: Secure Authentication (The Gateway to Real-World Action)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Solving the Token Problem: The Role of Managed OAuth, PKCE, and Refresh Tokens
&lt;/h3&gt;

&lt;p&gt;Before an agent can do anything, it needs a key. Securely acquiring that key is the foundational layer of your infrastructure. This is the problem that solutions for managed authentication for AI agents aim to solve. They abstract away the tedious and error-prone process of connecting to each API individually.&lt;/p&gt;

&lt;p&gt;This foundational pillar must include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Managed OAuth:&lt;/strong&gt; A robust system must handle the entire multi-step OAuth dance for you. This includes generating the correct authorization URL, handling the callback, exchanging the authorization code for a token, and securely storing the credentials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Modern Standards:&lt;/strong&gt; The security landscape evolves. The current standard is &lt;a href="https://oauth.net/2.1/" rel="noopener noreferrer"&gt;OAuth 2.1 with mandatory Proof Key for Code Exchange (PKCE)&lt;/a&gt;. PKCE is critical for headless agents that cannot securely store a client secret, as it &lt;a href="https://www.scalekit.com/blog/pkce-developers-guide-secure-oauth-flows" rel="noopener noreferrer"&gt;prevents authorization code interception attacks&lt;/a&gt;. Any modern OAuth for AI agents solution must support this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Persistent Sessions:&lt;/strong&gt; Users expect agents to work in the background without constant re-authentication. This requires a system that automatically refreshes expired access tokens. The security best practice here is &lt;a href="https://www.descope.com/blog/post/refresh-token-rotation" rel="noopener noreferrer"&gt;refresh token rotation&lt;/a&gt;, where a new refresh token is issued with every access token refresh, and the old one is immediately invalidated. This significantly reduces the risk of a compromised refresh token providing long-term access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secure Credential Storage:&lt;/strong&gt; Storing tokens, API keys, and other secrets in environment variables or application code is a major security risk. These credentials must be &lt;a href="https://developer.hashicorp.com/validated-patterns/vault/ai-agent-identity-with-hashicorp-vault" rel="noopener noreferrer"&gt;stored in an encrypted vault&lt;/a&gt;, completely isolated from your agent's application logic.&lt;/p&gt;

&lt;p&gt;Platforms that offer these features provide a necessary service. They solve the immediate pain of getting a token. But this is just the beginning.&lt;/p&gt;

&lt;p&gt;So your agent is authenticated. You have the key. The problem is solved, right? Wrong. Now you have a new, bigger problem: an autonomous agent with the full power of a user's account.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pillar 2: Granular Control (Establishing Guardrails for Autonomous Agents)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Your Agent Has the Keys. Who's Stopping It From Deleting Your Entire Google Drive?
&lt;/h3&gt;

&lt;p&gt;Once you have an OAuth token, you've given your agent the keys to a user's digital kingdom. A standard token grants the agent all of the user's permissions by default. This is a massive security risk, especially for autonomous agents. This is where the second pillar, Granular Control, becomes essential for any enterprise AI agent authentication platform. You need guardrails to ensure an agent can only do what it's supposed to do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Principle of Least Privilege:&lt;/strong&gt; An agent that only needs to read calendar events shouldn't have the power to delete your entire Google Drive. Your infrastructure must enforce the principle of least privilege by de-scoping the agent's permissions. Modern standards like &lt;a href="https://datatracker.ietf.org/doc/html/rfc9396" rel="noopener noreferrer"&gt;Rich Authorization Requests (RAR)&lt;/a&gt; allow an agent to request just-in-time, specific permissions for a single action, rather than asking for broad, standing access. This is a core tenet of AI agent security.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Preventing Credential Leakage:&lt;/strong&gt; One of the top risks for LLM applications, as identified by OWASP, is &lt;a href="https://blog.1password.com/security-principles-guiding-1passwords-approach-to-ai/" rel="noopener noreferrer"&gt;credential leakage through the prompt context&lt;/a&gt;. If you pass an API key or bearer token directly to the LLM, a clever prompt injection attack could trick the agent into revealing it. The solution is a &lt;a href="https://1password.com/solutions/agentic-ai" rel="noopener noreferrer"&gt;Brokered Credentials&lt;/a&gt; pattern. In this architecture, a secure middle layer makes the API call on the agent's behalf. The LLM decides what to do, but the broker handles the how. The LLM never sees the token, completely neutralizing this risk. This is a critical feature for platforms that securely connect AI agents to APIs.&lt;/p&gt;

&lt;p&gt;This sequence diagram illustrates the brokered credentials flow, ensuring the LLM never handles secrets:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxom1upuh7dr6a68f45n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxom1upuh7dr6a68f45n.png" alt="Composio-Brokered Jira Ticket Creation Flow" width="800" height="693"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Granular Access Control:&lt;/strong&gt; How do you enforce these fine-grained permissions at scale? The modern approach uses &lt;a href="https://www.cerbos.dev/blog/mcp-security-ai-agent-authorization-a-ciso-and-architects-guide" rel="noopener noreferrer"&gt;Policy-as-Code&lt;/a&gt; engines like Open Policy Agent (OPA) or Cedar. These systems externalize authorization logic, allowing you to define rules like "this agent can only transfer up to $100" or "this agent can only access records created this week". The tool-calling layer queries this policy engine before every action, ensuring every operation is explicitly permitted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Delegated Authority:&lt;/strong&gt; For a clear audit trail, you need to know not just what happened, but who authorized it. The &lt;a href="https://learn.microsoft.com/en-us/entra/identity-platform/v2-oauth2-on-behalf-of-flow" rel="noopener noreferrer"&gt;On-Behalf-Of (OBO) Token Exchange&lt;/a&gt; is the gold standard for this. The agent presents the user's token and its own credentials to an authorization server, which issues a new token containing claims for both the user and the agent. This creates an auditable chain of command, proving the agent was acting with delegated authority from the user.&lt;/p&gt;

&lt;p&gt;Simple auth solutions leave you to build this entire governance layer yourself. A true AI agent integration platform provides these guardrails out of the box, preventing catastrophic mistakes and giving you the control needed for enterprise-grade applications.&lt;/p&gt;

&lt;p&gt;Now your agent is authenticated and secure. It has a key, and it knows which doors it's allowed to open. You're ready for production. Almost. What happens when the lock changes or there are thousands of different doors?&lt;/p&gt;




&lt;h2&gt;
  
  
  Pillar 3: Reliable Action (The Engine for Scalable Integrations)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  An Agent That Can't Use Its Keys is Just an Expensive Chatbot
&lt;/h3&gt;

&lt;p&gt;Authentication and control are useless if the agent can't perform its job reliably and scalably across a wide range of tools. This is the final and most critical pillar of production-ready infrastructure. It's the engine that turns an agent's intent into reliable action in the real world.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 1: The "N+1" API Problem.&lt;/strong&gt; Every new tool you want your agent to use means learning a new API, a new data schema, and a new set of failure modes. Integrating with Jira is different from Asana, which is different from Trello. This maintenance burden grows with every new tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution: The Unified API.&lt;/strong&gt; A powerful integration platform abstracts this complexity behind a single, consistent interface. Your agent can learn to perform a generic action like &lt;code&gt;tasks.create&lt;/code&gt;, and the platform handles the translation to the specific API calls for Jira, Asana, or Trello. This dramatically &lt;a href="https://www.merge.dev/blog/best-ai-agent-auth-tool" rel="noopener noreferrer"&gt;simplifies agent development&lt;/a&gt; and maintenance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 2: The "What Can You Do?" Problem.&lt;/strong&gt; How does an agent discover the tools available to it and their specific functions without you hardcoding them? An agent needs to adapt as new tools are added or existing ones change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution: Standardized Tool Discovery.&lt;/strong&gt; The &lt;a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener noreferrer"&gt;Model Context Protocol (MCP)&lt;/a&gt; is an emerging standard that solves this. It allows an agent to dynamically query the integration platform to discover the hundreds of tools it can use, what actions each tool supports, and what parameters are required. This enables agents to be more autonomous and adaptable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 3: The "It Broke" Problem.&lt;/strong&gt; Real-world APIs are unreliable. They go down, they return unexpected errors, they have rate limits, and tokens can expire unexpectedly. A naive implementation will fail constantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution: A Managed Integration Layer.&lt;/strong&gt; A production-grade platform provides enterprise-grade infrastructure to handle this messy reality. This includes built-in retries with exponential backoff for transient errors, intelligent rate limit handling, comprehensive logging for debugging, and robust patterns like the &lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/saga-orchestration.html" rel="noopener noreferrer"&gt;Saga pattern&lt;/a&gt; for handling partial failures in multi-tool workflows. If one step in a five-step process fails, the system can gracefully roll back the completed steps to maintain a consistent state.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Achieve Observability for AI Agent Actions (Logging &amp;amp; Monitoring)
&lt;/h3&gt;

&lt;p&gt;A production-ready system isn't a black box. For DevOps and SREs, observability is non-negotiable. You need deep visibility into your agent's actions to debug failures, monitor performance, and control costs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured Logging:&lt;/strong&gt; Every tool call must be logged in a &lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html" rel="noopener noreferrer"&gt;structured format (like JSON)&lt;/a&gt;. These logs should include a &lt;code&gt;trace_id&lt;/code&gt; to correlate actions across services, along with critical context like &lt;code&gt;agent_id&lt;/code&gt;, &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;tool_name&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;, and &lt;code&gt;duration&lt;/code&gt;. This is essential for debugging failed workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics and Monitoring:&lt;/strong&gt; Your infrastructure should expose key metrics to a &lt;a href="https://grafana.com/docs/grafana-cloud/alerting-and-irm/slo/best-practices/" rel="noopener noreferrer"&gt;monitoring system like Prometheus or Grafana&lt;/a&gt;. Track API error rates (4xx, 5xx), p95/p99 latencies for tool calls, and token refresh success rates. Set up alerts for anomalies, such as a sudden spike in 401 errors, which could indicate a widespread credential issue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost and Usage Tracking:&lt;/strong&gt; Agents can make thousands of API calls. A managed platform should provide dashboards to track tool usage and associated costs, preventing runaway agents from causing unexpected bills from downstream API providers.&lt;/p&gt;

&lt;p&gt;This is where the value of a comprehensive platform becomes undeniable. It handles the messy, unreliable reality of working with hundreds of different APIs at scale, allowing you to focus on building intelligent agent logic, not brittle integration code.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Integrate the Three Pillars with LangChain or CrewAI
&lt;/h2&gt;

&lt;p&gt;The architectural concepts of Authentication, Control, and Action come together when you provide tools to your agent. A comprehensive platform abstracts these pillars into a simple set of tools that can be directly passed to frameworks like LangChain or CrewAI.&lt;/p&gt;

&lt;p&gt;The developer experience is streamlined to instantiating a client and retrieving the available tools for a given user. The platform handles the underlying complexity of token management, security, and reliability.&lt;/p&gt;

&lt;p&gt;This complete, runnable example demonstrates the full flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# --- Step 1: Installation ---
# Make sure you have the required packages installed.
# pip install python-dotenv langchain langchain-openai langchain-core langgraph composio-langchain
&lt;/span&gt;
&lt;span class="c1"&gt;# --- Step 2: Environment Setup ---
# Set your API keys in .env file
# export OPENAI_API_KEY="sk-..."
# export COMPOSIO_API_KEY="comp_..."
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;composio&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Composio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;composio_langchain&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LangchainProvider&lt;/span&gt;

&lt;span class="c1"&gt;# Load environment variables from .env file
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# In a real application, this would be the unique ID of your authenticated user.
# It tells Composio which user's connections to use.
&lt;/span&gt;&lt;span class="n"&gt;USER_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your-user-id&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Replace with a dynamic user ID
&lt;/span&gt;
&lt;span class="c1"&gt;# --- Step 3: Initialize the LLM and Composio Client ---
# Instantiate the LLM you want the agent to use.
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Instantiate the Composio client with LangchainProvider.
# It will automatically use the COMPOSIO_API_KEY from your environment.
&lt;/span&gt;&lt;span class="n"&gt;composio_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Composio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;LangchainProvider&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# --- Step 4: Fetch User-Specific Tools ---
# Fetch all tools for the "jira" toolkit that are available for the specified user.
# The `user_id` parameter is crucial for security and multi-tenancy.
# Composio's brokered credential pattern ensures the LLM never sees the user's token.
&lt;/span&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;composio_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;USER_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toolkits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jira&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error fetching tools: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="c1"&gt;# --- Step 5: Create and Run the Agent ---
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Create the agent using the new LangChain 1.0 pattern.
&lt;/span&gt;    &lt;span class="c1"&gt;# The create_agent function returns a compiled graph that can be invoked directly.
&lt;/span&gt;    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant that uses tools to perform tasks.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Invoke the agent to perform a task.
&lt;/span&gt;    &lt;span class="c1"&gt;# The agent will reason, select the jira.create_issue tool, and execute it.
&lt;/span&gt;    &lt;span class="c1"&gt;# Note: The new pattern uses a messages format instead of a simple input dict.
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create a Jira ticket in the &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PROJ&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; project to fix the auth bug.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent execution result:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An error occurred during agent execution: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No tools were fetched. Agent cannot execute the task.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Decision Framework: How to Choose Your Agent Architecture (Build vs. Buy vs. Integrate)
&lt;/h2&gt;

&lt;p&gt;When building your agent's infrastructure, you have three primary paths. Each comes with significant trade-offs in cost, speed, and security.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DIY (Do-It-Yourself):&lt;/strong&gt; You build the entire stack in-house. This gives you maximum control but requires a massive investment in engineering, security, and ongoing maintenance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auth Components (e.g., &lt;a href="https://nango.dev/" rel="noopener noreferrer"&gt;Nango&lt;/a&gt;, &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade&lt;/a&gt;):&lt;/strong&gt; You use a managed service to handle the initial OAuth headache (Pillar 1). This is a great starting point but leaves you to build the critical governance (Pillar 2) and action (Pillar 3) layers yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auth-to-Action Platform (e.g., &lt;a href="https://composio.dev/agentauth" rel="noopener noreferrer"&gt;Composio&lt;/a&gt;):&lt;/strong&gt; You use a comprehensive platform that provides an end-to-end solution covering all three pillars. This is the fastest and most secure path for most teams.&lt;/p&gt;

&lt;p&gt;The Total Cost of Ownership (TCO) for a DIY solution is often &lt;a href="https://workos.com/blog/workos-vs-auth0-vs-stytch" rel="noopener noreferrer"&gt;deceptively high&lt;/a&gt;. While there's no subscription fee, the hidden costs in engineering salaries, on-call burdens, and continuous security reviews can easily run into hundreds of thousands of dollars.&lt;/p&gt;

&lt;h3&gt;
  
  
  Comparison Table: Build vs. Buy vs. Integrate
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;DIY (In-House)&lt;/th&gt;
&lt;th&gt;Auth Components (e.g., Nango, Arcade)&lt;/th&gt;
&lt;th&gt;Auth-to-Action (e.g., Composio)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Authentication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full build required&lt;/td&gt;
&lt;td&gt;✅ Managed OAuth &amp;amp; Refreshes&lt;/td&gt;
&lt;td&gt;✅ Managed OAuth &amp;amp; Refreshes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Granular Control&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual build required&lt;/td&gt;
&lt;td&gt;❌ (Requires custom layer)&lt;/td&gt;
&lt;td&gt;✅ Built-in Governance &amp;amp; Scoping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Credential Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual build required&lt;/td&gt;
&lt;td&gt;❌ (LLM can still see token)&lt;/td&gt;
&lt;td&gt;✅ Brokered Credentials (No token in context)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Unified API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;❌ (Per-API integration)&lt;/td&gt;
&lt;td&gt;✅ Single interface for 500+ tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool Discovery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual build required&lt;/td&gt;
&lt;td&gt;❌ (Requires custom layer)&lt;/td&gt;
&lt;td&gt;✅ MCP for dynamic discovery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reliability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual build required&lt;/td&gt;
&lt;td&gt;❌ (Requires custom layer)&lt;/td&gt;
&lt;td&gt;✅ Managed Retries, Rate Limiting, Logging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time to Market&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6-12 months&lt;/td&gt;
&lt;td&gt;1-2 months&lt;/td&gt;
&lt;td&gt;1-2 weeks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TCO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Very High&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low (Predictable)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Conclusion: Don't Just Buy a Lock. Build a Secure House.
&lt;/h2&gt;

&lt;p&gt;The conversation around AI agent authentication is too narrow. Focusing only on getting a token is like buying a high-tech lock for your front door while leaving all the windows open and forgetting to build a foundation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auth-Only Solutions&lt;/strong&gt; give you a key to one door. It's a useful component, but it's not a complete solution for a production system.&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;"Auth-to-Action" Platform&lt;/strong&gt; like Composio gives you a master-key system for the entire building. It provides the keys (Authentication), a security guard to check permissions at every door (Control), and a unified concierge that can get any job done reliably (Action).&lt;/p&gt;

&lt;p&gt;Building truly useful, secure, and scalable AI agents requires thinking about the entire infrastructure, from the moment a user grants consent to the final, successful action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stop building patchwork infrastructure. Start building production-ready agents.&lt;/strong&gt; &lt;a href="https://platform.composio.dev/auth" rel="noopener noreferrer"&gt;Explore Composio's platform&lt;/a&gt; or &lt;a href="https://docs.composio.dev/docs/quickstart" rel="noopener noreferrer"&gt;read our 5-minute quickstart&lt;/a&gt; to see the three pillars in action.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What solutions offer authentication management for AI agents connecting to multiple applications?
&lt;/h3&gt;

&lt;p&gt;You have a few paths. You can build it all yourself using raw OAuth. You can use auth-only components like Nango or Arcade which are great at handling the initial token. Or you can use a full auth-to-action platform. Composio is an example of this. It handles the auth but also the security and reliability needed for production.&lt;/p&gt;

&lt;h3&gt;
  
  
  What AI agent integration platform offer enterprise-level control and governance?
&lt;/h3&gt;

&lt;p&gt;Enterprise control goes far beyond just authentication. It means having granular permissions, clear audit logs, and policy enforcement. Most auth-only tools do not provide this. You need a platform built for governance. Composio for example is designed for this. It lets you define and enforce rules for what an agent can and cannot do.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which AI agent authentication platforms are recommended for small teams?
&lt;/h3&gt;

&lt;p&gt;Small teams should look for the fastest path to production. Auth-only tools like Arcade or Nango are great starting points for the token. However your team still has to build the security and action layers. A complete platform like Composio can be much faster. It provides all the production-ready components out of the box. This often means a lower total cost of ownership because your team writes less code.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the most cost-effective managed OAuth for AI agents?
&lt;/h3&gt;

&lt;p&gt;Cost effectiveness depends on your total cost not just the subscription price. Open source options can seem free but require your team's time for hosting and maintenance. Managed auth services are low cost to start. But you must add the engineering cost of building your own governance and integration layers. Full platforms like Composio can be more cost-effective overall because they save significant engineering time.&lt;/p&gt;

&lt;h3&gt;
  
  
  What platforms can prevent credential leakage when integrating AI agents with external tools and apps?
&lt;/h3&gt;

&lt;p&gt;This is a major security risk. The best way to prevent leakage is with a pattern called Brokered Credentials. In this pattern the LLM never actually sees the API key or token. Instead a secure service like Composio makes the API call on the agent's behalf. This completely removes the risk of a token leaking through a prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  What platforms exist for granting AI agents access to use tools on behalf of users?
&lt;/h3&gt;

&lt;p&gt;This is a key challenge called delegated authority. Any platform you choose needs to handle this. This involves managing complex OAuth flows, refresh tokens, and ideally modern standards like PKCE. Platforms like Composio manage this entire lifecycle. They provide the secure infrastructure so your agent can act on a user's behalf without you building the auth system from scratch.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; agent auth, authentication for ai agents&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>webdev</category>
    </item>
    <item>
      <title>OpenAI launched Atlas and I killed it with a Chrome extension</title>
      <dc:creator>Sunil Kumar Dash</dc:creator>
      <pubDate>Wed, 05 Nov 2025 14:42:09 +0000</pubDate>
      <link>https://forem.com/composiodev/openai-launched-atlas-and-i-killed-it-with-a-chrome-extension-1cfb</link>
      <guid>https://forem.com/composiodev/openai-launched-atlas-and-i-killed-it-with-a-chrome-extension-1cfb</guid>
      <description>&lt;p&gt;OpenAI recently launched ChatGPT Atlas, a fork of Chromium with Agentic capabilities. The UI is clean, rebuilt with SwiftUI, AppKit and Metal, but take that away and it’s the same capabilities, you can already access on ChatGPT’s website.&lt;/p&gt;

&lt;p&gt;Is it really that hard to get agentic capabilities in your browser? Do you really need another browser for it? Turns out, no, a Chrome extension can do more than solve your problem. I spent my weekend building one, and here’s how you can make it too.&lt;/p&gt;

&lt;p&gt;Here’s the demo:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/Ea6SGiunsp8"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;Here’s the code for the project: &lt;a href="https://github.com/ComposioHQ/open-chatgpt-atlas" rel="noopener noreferrer"&gt;https://github.com/ComposioHQ/open-chatgpt-atlas&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Chrome Extensions Are Actually Perfect For This
&lt;/h2&gt;

&lt;p&gt;Before diving into the build, let me explain why a Chrome extension is the right approach. The first question I had to answer was: Can a Chrome extension do what an AI browser can do?&lt;/p&gt;

&lt;p&gt;The answer is yes, and here's why:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Extensions have access to everything that matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They can take screenshots of the current tab&lt;/li&gt;
&lt;li&gt;They can inject JavaScript into any page&lt;/li&gt;
&lt;li&gt;They can listen to page navigation events&lt;/li&gt;
&lt;li&gt;They can create UI (sidepanel, popups, context menus)&lt;/li&gt;
&lt;li&gt;They run with elevated permissions, the webpage doesn't have&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. They're easier to distribute:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No install process, just add to Chrome&lt;/li&gt;
&lt;li&gt;Updates happen automatically&lt;/li&gt;
&lt;li&gt;Works on any OS that runs Chrome&lt;/li&gt;
&lt;li&gt;Users don't have to abandon their existing browser setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. They're cheaper to build:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No need to maintain a Chromium fork&lt;/li&gt;
&lt;li&gt;No need to handle browser-level features (tabs, bookmarks, updates)&lt;/li&gt;
&lt;li&gt;Focus purely on the agent capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The UI is simple. All you need is a sidebar in the browser where the AI agent can take actions, and for anything the agent can't or won't do through browser automation, you can use MCP (Model Context Protocol) to route to external tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The first step was deciding on the LLM. There were three providers with top-tier models: OpenAI, Anthropic, and Google.&lt;/p&gt;

&lt;p&gt;OpenAI and Anthropic both charge for their APIs, with no free tier. This means a lot of people won't be able to access or build on top of them without immediately hitting a paywall. I wanted this to be something other developers could fork and experiment with without worrying about bills.&lt;/p&gt;

&lt;p&gt;Google, on the other hand, offers a generous free tier for Gemini models that most people can access and build on top of. The free tier gives you 150 requests per minute for gemini 2.5 pro, which is way more than you'll need unless you're running this commercially. Gemini 2.5 Computer Use is also cheaper and faster than Claude’s Computer Use with Sonnet 4.5.&lt;/p&gt;

&lt;p&gt;Setting up a Chrome extension is actually pretty straightforward. The core file is the &lt;code&gt;manifest.json&lt;/code&gt;It defines what the extension is and what permissions it needs. What we need is a Chrome extension that sits in the sidebar and can take actions on the open browser. This means we need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;code&gt;manifest.json&lt;/code&gt; that declares permissions and entry points&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;background.ts&lt;/code&gt; file that runs as a service worker, listening for messages from the sidepanel&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;content.ts&lt;/code&gt; that gets injected into webpages and can extract page content and execute actions&lt;/li&gt;
&lt;li&gt;UI files: &lt;code&gt;sidepanel.tsx&lt;/code&gt; (React), &lt;code&gt;settings.tsx&lt;/code&gt;, and their corresponding HTML/CSS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Taking the above requirements into consideration, here's how the file directory looks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;atlas-extension/
├── Core Extension Files
│   ├── manifest.json             // Chrome extension config
│   ├── background.ts             // Message router &amp;amp; coordinator
│   ├── content.ts                // Injected into pages, executes actions
│   ├── sidepanel.tsx             // Main chat interface &lt;span class="o"&gt;(&lt;/span&gt;React&lt;span class="o"&gt;)&lt;/span&gt;
│   ├── types.ts                  // TypeScript interfaces
│   ├── tools.ts                  // Composio tool definitions
│   ├── settings.tsx              // API key configuration
│   ├── settings.html
│   └── sidepanel.html
│
├── Config Files
│   ├── package.json
│   ├── vite.config.ts            // Build tool &lt;span class="o"&gt;(&lt;/span&gt;bundles TS → JS&lt;span class="o"&gt;)&lt;/span&gt;
│   ├── tsconfig.json
│   └── tsconfig.node.json
│
├── Styling
│   ├── sidepanel.css
│   └── settings.css
│
└── Assets
    └── icons/
        └── icon.png

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Permissions That Make It Work
&lt;/h2&gt;

&lt;p&gt;These are the permissions we need from Chrome that make a browsing agent work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="s2"&gt;"permissions"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
  &lt;span class="s2"&gt;"sidePanel"&lt;/span&gt;,      // Create sidebar UI
  &lt;span class="s2"&gt;"storage"&lt;/span&gt;,        // Save API keys, settings to chrome.storage.local
  &lt;span class="s2"&gt;"tabs"&lt;/span&gt;,           // ⭐ CRUser types &lt;span class="nb"&gt;command &lt;/span&gt;&lt;span class="k"&gt;in &lt;/span&gt;Sidepanel
  &lt;span class="s2"&gt;"history"&lt;/span&gt;,        // Read browser &lt;span class="nb"&gt;history&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;context&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="s2"&gt;"bookmarks"&lt;/span&gt;,      // Read bookmarks &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;context&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="s2"&gt;"webNavigation"&lt;/span&gt;,  // Track when pages load/unload
  &lt;span class="s2"&gt;"scripting"&lt;/span&gt;,      // Inject content scripts dynamically
  &lt;span class="s2"&gt;"contextMenus"&lt;/span&gt;    // Add right-click menu items
&lt;span class="o"&gt;]&lt;/span&gt;,

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The most important one is &lt;code&gt;tabs&lt;/code&gt;. This is what lets you capture screenshots of the current page, which is essential for computer use. Without screenshots, the AI is blind—it has no idea what the page actually looks like, so it can't make intelligent decisions about where to click or what to type.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;scripting&lt;/code&gt; Permission is also critical because it allows you to inject &lt;code&gt;content.ts&lt;/code&gt; into any webpage dynamically. This is how you execute actions on the page—clicking buttons, filling forms, scrolling, etc.&lt;/p&gt;




&lt;h2&gt;
  
  
  System Architecture: How Messages Flow
&lt;/h2&gt;

&lt;p&gt;Here's how the pieces talk to each other:&lt;/p&gt;

&lt;p&gt;The background.ts is the central nervous system. It’s always running, and it coordinates everything. When you send a message from the side panel, this worker routes it to the correct flow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3wzn48h46flfwidkb00.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3wzn48h46flfwidkb00.png" alt="Image 1"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Computer Use: The Browser Automation Loop
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Step 1:&lt;/strong&gt; The agent captures a screenshot of the current page state using &lt;code&gt;chrome.tabs.captureVisibleTab()&lt;/code&gt;. This screenshot is the agent's "eyes"—it sees what you see.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2:&lt;/strong&gt; The screenshot gets sent to Gemini along with your natural language intent ("click the login button") and the page's DOM structure (for additional context).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3:&lt;/strong&gt; Gemini analyzes the screenshot, identifies the login button visually, and returns a function call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;action&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;click&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;coordinates&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;450&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;y&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;320&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;reasoning&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Found login button at top-right of page&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4:&lt;/strong&gt; &lt;code&gt;background.ts&lt;/code&gt; receives this action and forwards it to &lt;code&gt;content.ts&lt;/code&gt; running on the current webpage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5:&lt;/strong&gt; &lt;code&gt;content.ts&lt;/code&gt; executes the click at those coordinates, shows a blue visual indicator to show what happened, and reports success or failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 6:&lt;/strong&gt; The loop repeats with a fresh screenshot of the new page state. If the click opened a modal, the next iteration sees the modal and can interact with it. If a page is loading, it waits and adapts.&lt;/p&gt;

&lt;p&gt;This repeats up to 30 times per task. Each iteration adapts based on what it sees. It's not running a predetermined script—it's genuinely reacting to the current state of the page.&lt;/p&gt;

&lt;h3&gt;
  
  
  How content.ts Executes Actions
&lt;/h3&gt;

&lt;p&gt;When &lt;code&gt;background.ts&lt;/code&gt; receives an &lt;code&gt;EXECUTE_ACTION&lt;/code&gt; message from Gemini (e.g., &lt;code&gt;{type: 'EXECUTE_ACTION', action: 'click', coordinates: {x: 100, y: 200}}&lt;/code&gt;), it relays this to &lt;code&gt;content.ts&lt;/code&gt; running on the current webpage.&lt;/p&gt;

&lt;p&gt;The content script's &lt;code&gt;executePageAction()&lt;/code&gt; function handles 12 different browser actions. Here are the important ones:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Click:&lt;/strong&gt; Uses &lt;code&gt;document.elementFromPoint(x, y)&lt;/code&gt; to find the element at those coordinates, then fires a click event. If a CSS selector is provided instead, it queries and clicks that element directly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;click&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;elementFromPoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;y&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;element&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tagName&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Fill:&lt;/strong&gt; Finds the input/textarea element, focuses it (which triggers any React state updates), then uses &lt;code&gt;keyboard_type()&lt;/code&gt; to type the text character-by-character. This is important for React apps that listen for input events instead of just checking &lt;code&gt;.value&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fill&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;elementFromPoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;y&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tagName&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;INPUT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tagName&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TEXTAREA&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;focus&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;keyboard_type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why character-by-character? Because if you just set &lt;code&gt;.value = "text"&lt;/code&gt;, React doesn't know the value changed. You have to dispatch keyboard events for each character so React's synthetic event system picks it up. This was one of those annoying things that took way too long to debug.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Scroll:&lt;/strong&gt; Scrolls the page (or a specific element) up/down/to-top/to-bottom by manipulating &lt;code&gt;scrollTop&lt;/code&gt; and &lt;code&gt;scrollLeft&lt;/code&gt; or using &lt;code&gt;.scrollIntoView()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Keyboard Type:&lt;/strong&gt; Types text one character at a time using &lt;code&gt;dispatchEvent(new KeyboardEvent('keydown'))&lt;/code&gt; and &lt;code&gt;dispatchEvent(new KeyboardEvent('keyup'))&lt;/code&gt;, mimicking real typing. This is actually faster than setting &lt;code&gt;.value&lt;/code&gt; because it doesn't cause React to re-render the entire component tree on every character.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Press Key:&lt;/strong&gt; Presses individual keys (Enter, Tab, Escape, etc.) by dispatching keyboard events. Useful for submitting forms or navigating through interfaces.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Key Combination:&lt;/strong&gt; Presses multiple keys simultaneously (Ctrl+A, Cmd+C, etc.) for complex keyboard shortcuts. This is how you can make the agent copy/paste or select all text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Drag &amp;amp; Drop:&lt;/strong&gt; Simulates drag-and-drop by dispatching &lt;code&gt;mousedown&lt;/code&gt;, &lt;code&gt;mousemove&lt;/code&gt;, and &lt;code&gt;mouseup&lt;/code&gt; events from source to destination coordinates. Useful for dragging sliders or reordering lists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. Hover:&lt;/strong&gt; Moves the mouse cursor to coordinates and fires &lt;code&gt;mouseover&lt;/code&gt; and &lt;code&gt;mousemove&lt;/code&gt; events. This is useful for triggering dropdowns or tooltips that only appear on hover. Each action returns a result object (e.g., &lt;code&gt;{success: true, element: 'BUTTON'}&lt;/code&gt;) back through &lt;code&gt;background.ts&lt;/code&gt; to the sidepanel, so Gemini can see what happened and decide on the next action. The content script also creates a visual indicator—a blue outline and pulsing circle at the action location—that disappears after 600ms. This gives you real-time feedback of what the agent is doing, which is surprisingly important for building trust. Without the visual feedback, the agent feels like a black box.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flow summary:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sidepanel calls streamWithGeminiComputerUse()&lt;/p&gt;

&lt;p&gt;→ Background.ts captures screenshot&lt;/p&gt;

&lt;p&gt;→ Gemini API receives screenshot + DOM&lt;/p&gt;

&lt;p&gt;→ Gemini returns function calls&lt;/p&gt;

&lt;p&gt;→ Background.ts forwards to content.ts&lt;/p&gt;

&lt;p&gt;→ content.ts executes actions&lt;/p&gt;

&lt;p&gt;→ Repeat up to 30 times&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Router: External API Integration
&lt;/h2&gt;

&lt;p&gt;Computer use is excellent for browser automation, but what if you need to send a Slack message? Or create a GitHub issue? Or search your Gmail?&lt;/p&gt;

&lt;p&gt;That's where the Tool Router comes in. Instead of looping with screenshots and browser actions, you hand off the work directly to specialised external services via Composio's 500+ integrated tools.&lt;/p&gt;

&lt;p&gt;The key difference: Computer use is iterative and visual (screenshot → analyze → act → repeat), while the Tool Router is a single API call to an external service. When you need to "send a Slack message," the Tool Router targets the Slack API, sends a single request, and the job is completed on their servers.&lt;/p&gt;

&lt;p&gt;The Tool Router handles three critical features:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Discovery:&lt;/strong&gt; Searches across all available tools to find tools that match your task. Returns relevant toolkits with their descriptions, schemas, and connection status. For example, if you say "send an email," it searches and finds &lt;code&gt;GMAIL_SEND_MESSAGE&lt;/code&gt;, &lt;code&gt;OUTLOOK_SEND_EMAIL&lt;/code&gt;, etc., and returns them with their parameters so Gemini knows what to call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Authentication:&lt;/strong&gt; Checks if you have an active connection to the required toolkit. If not, it creates an auth config and returns a connection URL using Composio's Auth Link. You complete authentication via this link, and your credentials are stored securely. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Execution:&lt;/strong&gt; Loads authenticated tools into context and executes them. Supports parallel execution across multiple tools for efficiency. For example, if you say "find all emails from Bob and create a summary doc," it can: - Search Gmail in parallel - Process results - Call Google Docs API to create the summary - All in one flow&lt;/p&gt;

&lt;p&gt;The beauty of this dual approach (Computer Use + Tool Router) is that you can mix them. You can use a computer to navigate to a page and extract information, then use the Tool Router to send that information via Slack. The agent picks which approach to use based on the task.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Sidepanel: Where You Actually Use It
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;sidepanel.tsx&lt;/code&gt; is where you interact with the agent. It's a React component that renders in Chrome's sidebar (that panel that slides out from the right side of the browser).&lt;/p&gt;

&lt;p&gt;Here's what it does:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Chat interface:&lt;/strong&gt; You type natural language commands ("click the login button", "fill out this form with my details", "send a summary of this page to Slack").&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Live conversation history:&lt;/strong&gt; Displays the back-and-forth between you and the agent, including what actions it took and why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Mode switcher:&lt;/strong&gt; Toggle between two systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Computer Use (Gemini):&lt;/strong&gt; For direct browser automation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tool Router (Composio):&lt;/strong&gt; For external API calls to Gmail, Slack, GitHub, etc.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Visual feedback:&lt;/strong&gt; Shows when actions are executing, displays screenshots the agent is analyzing (if you want), and reports errors clearly.&lt;/p&gt;

&lt;p&gt;The interface is intentionally minimal. You don't need a complex UI when the agent is doing all the work: just a text input and a conversation history.&lt;/p&gt;




&lt;h2&gt;
  
  
  AI Coding Tool Costs
&lt;/h2&gt;

&lt;p&gt;I started with Claude Sonnet 4.5 in Cursor. I set a $50 budget and figured that would last me at least a week. It was gone in three days.&lt;/p&gt;

&lt;p&gt;The problem with Sonnet isn't that it's bad at coding—it's excellent at coding. The problem is that it's a token-guzzling machine. Here's where the tokens went:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Redundant documentation files:&lt;/strong&gt; Sonnet loves creating &lt;code&gt;TECHNICAL_IMPLEMENTATION.md&lt;/code&gt;, &lt;code&gt;ARCHITECTURE.md&lt;/code&gt;, &lt;code&gt;CHANGELOG.md&lt;/code&gt;, and other markdown files that have no real utility except maybe giving Claude context on the changes it made. Highly inefficient when you're trying to complete a project quickly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Verbose explanations:&lt;/strong&gt; Every code change comes with a three-paragraph explanation of &lt;em&gt;why&lt;/em&gt; it made the change. Great for understanding, terrible for token efficiency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Full-file rewrites:&lt;/strong&gt; Instead of making targeted edits, Sonnet often rewrites entire files. If you have a 500-line file and need to change one function, Sonnet will regenerate all 500 lines. That's 500 output tokens instead of 20.&lt;/p&gt;

&lt;p&gt;Here's what my Cursor usage looked like with Sonnet 4.5:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6j5txyu9lyf1vb4bpfz3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6j5txyu9lyf1vb4bpfz3.png" alt="Cursor dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the first three days, I'd burned through most of my $50 budget. I was stretching it by switching to Composer mode (which is slower but more thoughtful), but even that wasn't sustainable.&lt;/p&gt;

&lt;p&gt;Then Anthropic launched Haiku 4.5, which performs at the same level as Sonnet 4. I was sceptical—usually "performs at the same level" means "performs at 80% of the level for niche tasks"—but I was desperate.&lt;/p&gt;

&lt;p&gt;I switched to Haiku 4.5 midway through the project. I completed the remaining work for $30 total.&lt;/p&gt;

&lt;p&gt;Here's the difference:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6qeakrehofc7zvb7lke4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6qeakrehofc7zvb7lke4.png" alt="Image 3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key observations:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Haiku 4.5:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More focused changes, fewer tokens per edit&lt;/li&gt;
&lt;li&gt;Rarely creates unnecessary documentation files&lt;/li&gt;
&lt;li&gt;Makes targeted edits instead of full-file rewrites&lt;/li&gt;
&lt;li&gt;Suggestion acceptance rate is actually higher (because suggestions are more precise)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sonnet 4.5:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better at high-level architecture decisions&lt;/li&gt;
&lt;li&gt;More verbose explanations (good for learning, bad for budget)&lt;/li&gt;
&lt;li&gt;More likely to rewrite everything&lt;/li&gt;
&lt;li&gt;Suggestion acceptance rate lower (because it suggests more changes per edit)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The verdict:&lt;/strong&gt; For extension development—or really any project where you know roughly what you need to build—Haiku 4.5 is 95% as good for 30% of the cost.&lt;/p&gt;

&lt;p&gt;The 5% where Sonnet is better? Initial architecture decisions, figuring out how to structure something you've never built before, debugging peculiar issues. But for "implement this feature" or "fix this bug," Haiku is more than good enough.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Claude Got Wrong (So You Don't Waste Hours Like I Did)
&lt;/h2&gt;

&lt;p&gt;Let me save you some pain by documenting the places where Claude Code absolutely struggled. These aren't bugs in Claude—they're gaps in its understanding of Chrome extension architecture and Gemini's API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Issue 1: Text Input Wouldn't Work
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms:&lt;/strong&gt; The agent could click buttons, scroll pages, and navigate between screens. But it couldn't type text into input fields. Every time it tried, nothing happened.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude's diagnosis (first 10 attempts):&lt;/strong&gt; "The coordinates must be wrong. Let me try calculating them differently."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude's diagnosis (next 10 attempts):&lt;/strong&gt; "Maybe the input isn't focused. Let me add a focus event first."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude's diagnosis (next 10 attempts):&lt;/strong&gt; "The timing might be off. Let me add delays between keystrokes."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Actual problem:&lt;/strong&gt; Gemini requires human-in-the-loop permission for tasks it considers sensitive, like typing text. By default, it blocks text input actions entirely unless you explicitly tell it not to.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// In your Gemini API config&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;lt;https://generativelanguage.googleapis.com/v1beta/&amp;gt;...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...],&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...],&lt;/span&gt;
    &lt;span class="na"&gt;safety_settings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;HARM_CATEGORY_DANGEROUS_CONTENT&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;BLOCK_NONE&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;// ← This is what you need&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also manually reduce the guardrails Gemini has active by default. There's a &lt;code&gt;safety_settings&lt;/code&gt; parameter that lets you control how conservative the model is about "dangerous" actions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time wasted:&lt;/strong&gt; 2+ hours across multiple sessions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The kicker:&lt;/strong&gt; This is documented in Google's Gemini Computer Use guide, but Claude never thought to check there. It was convinced it was a coordinate or timing issue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; When working with computer use models, always check their specific documentation for permission and safety settings BEFORE debugging for hours. The model might be refusing to do something, not failing to do it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Issue 2: Screenshot Capture Hell
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms:&lt;/strong&gt; The computer use loop would start, send the first screenshot, Gemini would respond with an action, and then the extension would crash when trying to capture the second screenshot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude's diagnosis (attempt 1-5):&lt;/strong&gt; "The screenshot might be too large. Let me try compressing it more."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude's diagnosis (attempt 6-10):&lt;/strong&gt; "Maybe the format is wrong. Let me try converting PNG → JPG."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude's diagnosis (attempt 11-15):&lt;/strong&gt; "Let me try JPG → PNG instead."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude's diagnosis (attempt 16-30):&lt;/strong&gt; Variations of the above, trying different quality settings, different compression libraries, different encoding methods.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Actual problem:&lt;/strong&gt; Chrome extensions can't capture screenshots from the sidebar context. The sidebar runs in its own isolated context and doesn't have access to the main window's visual content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The wrong approach (what Claude kept trying):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This doesn't work from sidepanel context&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;screenshot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;chrome&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tabs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;captureVisibleTab&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// Error: No tab found&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The right approach:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// You need to query for the main window's active tab first&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tabs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;chrome&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tabs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="na"&gt;active&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;currentWindow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tabs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;screenshot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;chrome&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tabs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;captureVisibleTab&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tabs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;windowId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;png&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference is subtle but critical. The sidebar doesn't have a concept of "current window" in the same way a content script does. You have to explicitly query for the active tab and specify its window ID.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time wasted:&lt;/strong&gt; 1+ hour, 30+ different approaches&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Moment of realization:&lt;/strong&gt; I finally found this buried in a GitHub issue from 2019 where someone else had the exact same problem. It's not in Chrome's official documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; Claude doesn't understand Chrome extension context boundaries. When it fails to capture something that should work, check if you're in the right context (background vs content vs sidepanel vs popup).&lt;/p&gt;

&lt;p&gt;This specific fix resolved the frustration from a long debugging session where I was trying every possible variation of the screenshot API without understanding the fundamental issue.&lt;/p&gt;

&lt;h3&gt;
  
  
  Issue 3: Permission Manifest Confusion
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms:&lt;/strong&gt; Some Chrome APIs would work in development but fail in production after packaging the extension.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude's diagnosis:&lt;/strong&gt; "The manifest permissions must be incomplete. Let me add more permissions."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Claude did:&lt;/strong&gt; Added every permission that sounded remotely related: &lt;code&gt;activeTab&lt;/code&gt;, &lt;code&gt;tabs&lt;/code&gt;, &lt;code&gt;&amp;lt;all_urls&amp;gt;&lt;/code&gt;, &lt;code&gt;webRequest&lt;/code&gt;, etc.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Actual problem:&lt;/strong&gt; Chrome has different permission requirements for MV3 (Manifest V3) extensions vs MV2. Claude kept suggesting MV2 patterns because most Stack Overflow answers are from the MV2 era.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Understanding the difference between MV3's service workers and MV2's background pages, and adjusting the manifest accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time wasted:&lt;/strong&gt; 30 minutes&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; Always check which manifest version you're using and make sure Claude's suggestions match that version. The APIs are similar but the permission model is different.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;I've open-sourced the code at &lt;a href="http://github.com/composiohq/open-chatgpt-atlas" rel="noopener noreferrer"&gt;github.com/composiohq/open-chatgpt-atlas&lt;/a&gt;. Here's how to get started:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup (5 minutes):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Clone the repo: &lt;code&gt;git clone ...&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Install dependencies: &lt;code&gt;npm install&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Build the extension: &lt;code&gt;npm run build&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Load in Chrome: Go to &lt;code&gt;chrome://extensions&lt;/code&gt;, enable Developer Mode, click "Load unpacked", select the &lt;code&gt;dist&lt;/code&gt; folder&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Get a Gemini API key: &lt;a href="https://ai.google.dev/" rel="noopener noreferrer"&gt;https://ai.google.dev/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Open the extension, go to Settings, paste your API key&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;First task:&lt;/strong&gt; Open any webpage, click the extension icon, and try: "Click the search button and type 'AI browser agents'"&lt;/p&gt;

&lt;p&gt;Watch the blue flashes as it executes each action. If it fails, check the console for errors (right-click the extension → Inspect).&lt;/p&gt;

&lt;p&gt;Thanks for the read. Here is the &lt;a href="https://github.com/ComposioHQ/open-chatgpt-atlas" rel="noopener noreferrer"&gt;repository&lt;/a&gt;; feel free to star it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>javascript</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I built a voice AI agent to clean my emails, meetings, and Slack DMs (Composio, Vapi, OpenAI TTS) 🪄</title>
      <dc:creator>Sunil Kumar Dash</dc:creator>
      <pubDate>Tue, 23 Sep 2025 11:07:05 +0000</pubDate>
      <link>https://forem.com/composiodev/i-built-a-voice-ai-agent-to-clean-my-emails-meetings-and-slack-dms-composio-vapi-openai-tts-472b</link>
      <guid>https://forem.com/composiodev/i-built-a-voice-ai-agent-to-clean-my-emails-meetings-and-slack-dms-composio-vapi-openai-tts-472b</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;I am the Voice from the Outer World! I will lead you to PARADISE&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Paul Atreides uses the Voice as a tool for control and assertion. Imagine commandeering an AI agent with this voice. We built an AI agent using &lt;a href="https://composio.dev" rel="noopener noreferrer"&gt;Composio&lt;/a&gt;, &lt;a href="https://vapi.ai" rel="noopener noreferrer"&gt;Vapi&lt;/a&gt;, and &lt;a href="https://platform.openai.com/docs/guides/text-to-speech" rel="noopener noreferrer"&gt;OpenAI TTS&lt;/a&gt; integrated with Gmail, Slack, and Google Calendar. It can summarise emails, schedule meetings, and search for Slack messages. Your entire morning routine is stress-free.&lt;/p&gt;

&lt;p&gt;The entire thing was built using Claude Code inside the Cursor IDE. &lt;/p&gt;




&lt;h2&gt;
  
  
  The problem in Arrakis
&lt;/h2&gt;

&lt;p&gt;Checking Slack and Gmail is a morning ritual I religiously follow, but comprehending each message while still half-asleep feels like swimming through molasses. Voice agents excel at this exact problem - you can ask them to summarise the critical stuff, explain confusing threads, or drill into specific details while you make coffee. The impressions in the screenshot below also indicate that I’m not alone in facing this problem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffe46hwjb5lzxi3n3gcjh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffe46hwjb5lzxi3n3gcjh.png" alt="Mathew Berman tweet"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Cultivating the Spice
&lt;/h2&gt;

&lt;p&gt;Started with a Next.js app and immediately hit the latency wall that kills most voice projects. Voice demands conversational flow - unlike text interfaces, where users tolerate waiting, voice agents need to respond instantly or the illusion breaks. My initial approach was embarrassingly naive: STT → LLM → Tool Call → TTS. Sequential processing meant 3-5 seconds of awkward silence after each command.&lt;/p&gt;

&lt;p&gt;Then I discovered Vapi, which handles the entire voice pipeline elegantly - parallel processing, model swapping, automatic interruption handling. It turned my clunky prototype into something that actually feels conversational.&lt;/p&gt;

&lt;p&gt;For integrations, Composio was the obvious choice - it abstracts away the OAuth complexity and gives you clean, reliable connections to Gmail, Calendar, and Slack without writing boilerplate for each API.&lt;/p&gt;

&lt;p&gt;On the development side, I'm convinced that running Claude Code inside Cursor is the optimal setup. Standalone Claude Code in terminal lacks proper diffs - you're flying blind with file changes. Cursor alone has good DX but weaker code generation. But Claude Code &lt;em&gt;inside&lt;/em&gt; Cursor? You get Claude's superior coding ability with Cursor's visual diffs, giving you a lot more control and visibility over the changes being made.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where the Spice Flows
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5kptmnb768dlosouawy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5kptmnb768dlosouawy.png" alt="Voice agent architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User → Vapi Widget&lt;/strong&gt;: User clicks “Talk to Assistant” to start the voice session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Widget → LLM&lt;/strong&gt;: The widget starts a call, sending the system prompt, model, voice, and the tool catalogue from vapiToolsConfig with concrete server URLs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM → Widget&lt;/strong&gt;: The LLM streams speech and final transcripts back; the widget updates speaking/listening indicators and the transcript UI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM → API Route&lt;/strong&gt;: When an action is needed (e.g., send email), the LLM triggers a tool call: an HTTP POST to the matching /api/tools/... route.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API self-work (route-helpers)&lt;/strong&gt;: The route extracts the toolCallId and arguments, races execution against a 30-second timeout, and normalises errors/success into Vapi’s expected response shape.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API → Composio&lt;/strong&gt;: The route calls the relevant wrapper in lib/composio.ts, which invokes &lt;a href="http://composio.tools/" rel="noopener noreferrer"&gt;composio.tools&lt;/a&gt;.execute(...).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composio → Provider&lt;/strong&gt;: Composio talks to Gmail/Calendar/Slack APIs and returns a ComposioToolResult.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API → LLM&lt;/strong&gt;: The API responds with { results: [{ toolCallId, result }] }. The LLM consumes this and continues the conversation with updated context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM/Widget → User&lt;/strong&gt;: The widget reflects new messages/results in the transcript and UI state.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Following Shai Hulud
&lt;/h2&gt;

&lt;p&gt;Claude Code's recent performance has been frustrating, and I'm not alone in noticing the decline in quality. Despite feeding it comprehensive documentation from Composio and Vapi, it consistently reverted to outdated API patterns. I'd explicitly show it how to implement routes using Vapi's specific request/response schemas, and it would acknowledge understanding, then immediately generate code using deprecated methods. Its debugging process became almost comical - fix one error, create three new ones, then insist the original fix was perfect while ignoring the fresh breakage. The silver lining? It nailed the core architecture, cleanly separating Composio actions into individual route files with a centralised wrapper.&lt;/p&gt;

&lt;p&gt;The UI challenge revealed another Claude Code quirk: without explicit visual direction, it defaults to the same tired template every time - hero section, three feature cards, call it a day. Voice interfaces are surprisingly hard to find inspiration for; most hide behind wake words or bury the actual interaction. Thankfully, Vapi's documentation included a pre-built voice widget that I could feed directly to Claude Code as a starting point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/J3UG6VrlXFU"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The agent currently handles nine core actions across three platforms: Gmail (fetch, send, and draft), Slack (create channels, list conversations, and send messages), and Google Calendar (create events and find conflicts). Each action executes with sub-500ms latency - fast enough that conversation never breaks flow.&lt;/p&gt;

&lt;p&gt;The real power is &lt;a href="https://composio.dev" rel="noopener noreferrer"&gt;Composio's extensibility&lt;/a&gt;. Adding new tools requires just a few lines of configuration rather than wrestling with OAuth flows and API quirks. Want Notion for meeting notes? Linear for task creation? Each addition makes the assistant exponentially more useful. The vision is simple: reduce the mechanical parts of knowledge work to voice commands.&lt;/p&gt;

&lt;p&gt;Vapi’s observability on the dashboard is extremely helpful when trying to debug behaviours with voice agents because, unlike text, you can’t directly get into the trenches. Metrics and call logs provide a clear understanding of the agent’s behaviour.&lt;/p&gt;

&lt;p&gt;Next on the roadmap: MCP (Model Context Protocol) support for smarter tool coordination, improved response handling to make conversations feel more natural rather than command-response, and a UI that actually shows what's happening under the hood. The current interface works, but it should feel like magic - visual feedback for active tools, confidence scores for actions, and a preview of what's about to happen before confirmation. The foundation is solid; now it's time to make it shine.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
    <item>
      <title>We raised $29M to make your agents stronger, smarter, and better</title>
      <dc:creator>Sunil Kumar Dash</dc:creator>
      <pubDate>Tue, 22 Jul 2025 17:10:51 +0000</pubDate>
      <link>https://forem.com/composiodev/we-raised-29m-to-make-your-agents-stronger-smarter-and-better-3fa2</link>
      <guid>https://forem.com/composiodev/we-raised-29m-to-make-your-agents-stronger-smarter-and-better-3fa2</guid>
      <description>&lt;p&gt;Support us on Twitter by liking or just quoting with whatever you feel like. We put some real work into this video, do check it out.&lt;/p&gt;

&lt;p&gt;&lt;iframe class="tweet-embed" id="tweet-1947680602083496319-686" src="https://platform.twitter.com/embed/Tweet.html?id=1947680602083496319"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-1947680602083496319-686');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1947680602083496319&amp;amp;theme=dark"
  }



&lt;/p&gt;

&lt;h2&gt;
  
  
  Our thoughts on the future of the Agents landscape
&lt;/h2&gt;

&lt;p&gt;AI agents today don't learn from experience. You can engineer prompts endlessly, but your agents won't build intuition over time. They won't learn why API edge cases need special handling or remember your specialised way of interacting with complex systems.&lt;/p&gt;

&lt;p&gt;At Composio, we're building the infrastructure that enables AI agents to evolve, with $29M in funding.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Infrastructure for Intuition
&lt;/h2&gt;

&lt;p&gt;We’re creating a shared learning infrastructure where every interaction makes the entire ecosystem smarter. When one agent masters a tool or discovers an optimal workflow, every agent on our platform benefits instantly.&lt;/p&gt;

&lt;p&gt;Humans have always built better tools for themselves over centuries from shared experiences - now, we're bringing that advantage to AI at unprecedented speed and scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evolving Skills, Not Static Tools
&lt;/h2&gt;

&lt;p&gt;Agents built with Composio won’t just execute tasks; they will evolve like a fleet of Waymos, learning from each other.&lt;/p&gt;

&lt;p&gt;Imagine an agent finding a founding engineer in San Francisco. Through feedback, it learns valuable heuristics: Twitter outperforms LinkedIn, prioritize daily coders, and verify vesting schedules. These insights are inherited across our platform—no starting from scratch. All Composio agents collectively learn and develop real-time intuition from these interactions.&lt;/p&gt;

&lt;p&gt;We're nowhere close. Building infrastructure for collective AI learning means solving problems no one's cracked yet. How do you capture tacit knowledge from millions of interactions? How do you turn edge cases into intuition? How do you make sure skills evolve with experience? Hard problems. But solvable ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  Join Us
&lt;/h2&gt;

&lt;p&gt;We're building a team that will craft the infrastructure shaping AI's future alongside teammates committed to creating systems that feel magical.&lt;/p&gt;

&lt;p&gt;We’re looking for engineers who:&lt;/p&gt;

&lt;p&gt;Love building distributed, self-improving systems&lt;/p&gt;

&lt;p&gt;Think infrastructure should be elegant and invisible&lt;/p&gt;

&lt;p&gt;If you're excited about this, reach out: &lt;a href="mailto:hiring@composio.dev"&gt;hiring@composio.dev&lt;/a&gt; or DM me on &lt;a href="https://x.com/GanatraSoham" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;— Soham, CEO, Composio&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>I cloned this VC-funded AI super agent app in a weekend, here's how🪄✨</title>
      <dc:creator>Sunil Kumar Dash</dc:creator>
      <pubDate>Thu, 17 Jul 2025 14:31:09 +0000</pubDate>
      <link>https://forem.com/composiodev/i-cloned-this-vc-funded-ai-super-agent-app-in-a-weekend-heres-how-43np</link>
      <guid>https://forem.com/composiodev/i-cloned-this-vc-funded-ai-super-agent-app-in-a-weekend-heres-how-43np</guid>
      <description>&lt;p&gt;General-purpose AI agents like Manus and GenSpark have caught everyone’s attention. And VSs are pouring money into them. You can find many in the YC cohorts. These agents are really cool and provide access to a wide range of external tools used in our daily lives, such as spreadsheets, documents, and PowerPoint slides.&lt;/p&gt;

&lt;p&gt;I received a text to build this kind of Agent within 24 hours for a demo. Let’s vibe code this shit. Here’s how I went about it. I opened my Cursor instance and set up the repo. My weapon of choice was Claude 4 Sonnet (thinking) in agent mode.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fere6ktibk29sth27kvo4.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fere6ktibk29sth27kvo4.gif" alt="Kid vibing"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Vibe Coding Setup
&lt;/h2&gt;

&lt;p&gt;I had to choose between Claude Code and Cursor IDE. For something more open-ended, I’d use Claude Code to let the model explore and build, but due to time constraints, I needed more control, so I went with Cursor Agent. I decided to make a Web App with NextJS and use the AI SDK for the ease of using agents and LLMs with the AI SDK. &lt;/p&gt;

&lt;p&gt;Compared to Langgraph, it'd be significantly more complex, and I’d have to define the workflows myself, which isn’t necessary for open-ended tasks. Instead of the Gemini 2.5 Pro + GPT 4.1 approach last time, I went all guns blazing with Claude 4 Sonnet (thinking), hoping the model would be able to handle most development without me managing every aspect.&lt;/p&gt;

&lt;p&gt;For the Agent tools, &lt;a href="https://composio.dev/" rel="noopener noreferrer"&gt;Composio&lt;/a&gt; was the choice because I can handle authentication with Google Suite Apps and utilise their APIs as actions in the agent without having to read  and plumb Google’s API documentation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frbe3irvumwwuuvw1iqpf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frbe3irvumwwuuvw1iqpf.png" alt="Super agent dashboard"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What to avoid?
&lt;/h2&gt;

&lt;p&gt;The worst mistake you can make while vibe coding is making open-ended requests. I made the dumb mistake of giving Claude documentation and asking it to build based on that. The code it wrote was disastrous and just bad. Not to mention, Claude also tends to use a lot of dummy variables, which is the worst part. I rejected all of its changes. I had set up the Next.js project and installed the necessary packages, mainly AI SDK and Composio. What core abilities from GenSpark do I want to replicate? Its ability to read/edit sheets, documents, and presentations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting back on course
&lt;/h2&gt;

&lt;p&gt;I didn’t expect how easy it would be to embed Google Sheets/Docs as iframes on a sidebar. I expected a process, but it was straightforward. I can’t build this for my users without implementing authentication for each user’s Google Sheets account. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffyyg9dmd1uyi3cxktxde.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffyyg9dmd1uyi3cxktxde.png" alt="Super agent schema"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I used Composio for easy authentication with Sheets and Docs. Once signed in, the agent can access the user’s files. After handling authentication, the challenging part was enabling the agent to create presentations. There’s no native tool that lets you do it, and I did not want to explore the Google Slides API. I referred to GenSpark and noticed it wrote HTML code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7qd6el4n0027007cark.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7qd6el4n0027007cark.png" alt="Super agent workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The super agent recognises the request for a presentation and responds with ‘[SLIDES]’. This triggers the generate slides endpoint, where the super agent passes the topic, content, slide count, and style. In the generate presentation endpoint, an LLM generates an array of slide objects containing: type, title, content, and bullet points. This array is received by the frontend and, using my static HTML code, renders a preview version that the user can download.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdcnv2n0qdfvsjetedmbh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdcnv2n0qdfvsjetedmbh.png" alt="Super agent"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s discuss the Google Sheets and Docs integrations. I wanted to add a sidebar to view the sheets/docs being edited in real-time. It’s nice to see the changes instantly as your agent does it. Composio to the rescue. I had the toolkits ready, just had to pass them to the generateText function from the AI SDK. I added the code to render a resizable sidebar for any detected drive doc URL. I integrated a web search tool, and now it was time for the Browser. &lt;/p&gt;

&lt;p&gt;In Python, there are multiple Browser-based Agent libraries, but in JS, there are very few of them. I planned to use a famous Browser provider, but it refused to let me sign in. I tried deleting the cookies, but I couldn’t spend time fixing that because of the deadline, so I looked at other options and chose Puppeteer since it was easy to integrate. &lt;/p&gt;

&lt;p&gt;I provided Claude with the documents for Composio’s custom tool creation, and Claude created the Puppeteer tool, wrapped it in the custom tool format, and passed it to the Super Agent with the ability to scrape, click, and input text.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6nr7f0w02bzzh7fmk22.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6nr7f0w02bzzh7fmk22.png" alt="Super agent output"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The final demo included reading data from Sheets/Docs and using it to generate slides dynamically. It worked successfully and met the deadline.&lt;/p&gt;

&lt;p&gt;The code is on &lt;a href="https://github.com/ComposioHQ/google-super-agent" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Fork it, break it, make it better.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/ot-eOvaK61o"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  How difficult was it to vibe code?
&lt;/h2&gt;

&lt;p&gt;I have to admit there were a lot of times when existing features broke when I tried to add new ones, and a persistent error was Claude’s confusion about using Tailwind v3/v4, creating scenarios where I had to restore checkpoints to ensure the UI didn’t break. I wrote the code for all the route files, and I don’t think AI agents are good at backend logic as they are at the frontend. I used one or two &lt;a href="https://21st.dev/" rel="noopener noreferrer"&gt;21st.dev&lt;/a&gt; components for the UI.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>javascript</category>
      <category>ai</category>
    </item>
    <item>
      <title>From Figma designs to pixel-perfect components using Figma MCP &amp; Claude Code 🧙🪄</title>
      <dc:creator>Sunil Kumar Dash</dc:creator>
      <pubDate>Mon, 07 Jul 2025 15:26:04 +0000</pubDate>
      <link>https://forem.com/composiodev/from-figma-designs-to-pixel-perfect-components-using-figma-mcp-claude-code-3ao</link>
      <guid>https://forem.com/composiodev/from-figma-designs-to-pixel-perfect-components-using-figma-mcp-claude-code-3ao</guid>
      <description>&lt;p&gt;Figma is one of the best tools to emerge in the last decade or so. Regardless of the organisation's size, everyone uses Figma for everything, from landing pages to dashboards. And if you have been one of those poor souls tasked to make designs into pixel-perfect app components, I understand you. Been there, done that.&lt;/p&gt;

&lt;p&gt;The good news is that with all these fancy technologies,  LLMs, CLI agents, and MCPs, things are going to make this a whole lot easier.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuktd7b1mxj8aosux76vl.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuktd7b1mxj8aosux76vl.gif" alt="Composio Figma MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Yes, I have been using Claude Code a lot lately; it's the best thing that has happened to humanity after Messi's FIFA 22 campaign (Don't get mad, please) and tying MCP servers with it can do wonders.&lt;/p&gt;

&lt;p&gt;In this blog post, I will share how you can configure the Figma MCP with Claude Code to build pixel-perfect front-end components.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Covered?
&lt;/h2&gt;

&lt;p&gt;Configuring Composio Figma MCP (This is the best Figma MCP server, BTW!). Try it to believe.&lt;/p&gt;

&lt;p&gt;Integrate the Figma MCP server with Claude Code to build frontend components. (You can use it with Cursor and Gemini CLI as well, but I like Claude Code more)&lt;/p&gt;




&lt;h2&gt;
  
  
  Set up Figma MCP server and Claude Code
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;💁 We'll use Composio to add the Figma MCP server support to our Claude Code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You don't need to create an account; head over to mcp.composio.dev and, under the Figma integration, generate the command.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl1pam7lcl37sy2uqxo5m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl1pam7lcl37sy2uqxo5m.png" alt="Composio Figma MCP Dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The command should look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="nx"&gt;npx&lt;/span&gt; &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;composio&lt;/span&gt;&lt;span class="sr"&gt;/mcp@latest setup "&amp;lt;https:/&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;composio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;partner&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;composio&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;figma&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;&lt;span class="nx"&gt;customerId&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;" "figma-605dcr-13" --client claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡NOTE: You can pretty much use the same command to set up for Cursor as well. The only difference is to change the --client to use cursor and that's it. You can then simply go ahead and start cloning any design.&lt;br&gt;
Upon running this command, you should see something like this:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dl6mzh4nrb3g4aql0jm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dl6mzh4nrb3g4aql0jm.png" alt="Composio MCP npx output"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, by default, it saves to the &lt;code&gt;~/.config/Claude/claude_desktop_config.json&lt;/code&gt;.&lt;br&gt;
However, I prefer not to save it globally. So, in the project where you plan to run Claude Code, make sure you copy that file to a local .mcp.json file.&lt;/p&gt;

&lt;p&gt;This helps separate MCP servers per project, which is very helpful when adding multiple ones in the future for other projects.&lt;/p&gt;

&lt;p&gt;Run the following command to copy the file to the current directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; ~/.config/Claude/claude_desktop_config.json .mcp.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By doing just that, you're almost done with the complete setup.&lt;/p&gt;

&lt;p&gt;Run Claude in the project where you've just copied the .mcp.json file, and you should see something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4w2lkqzuk6nm47c64rk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4w2lkqzuk6nm47c64rk.png" alt="Run Figma MCP with Claude Code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hit yes, and inside Claude Code, run /mcpYou should see the recent MCP server status as connected, and you can view a list of all the tools as well.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4frmtetujaf3gpmjt9bt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4frmtetujaf3gpmjt9bt.png" alt="Claude Code MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, that's all the setup you need to do on the Claude side. There's one small step left, and as you can guess, that's to authenticate with Composio.&lt;/p&gt;

&lt;p&gt;Currently, we've only added the server but have not yet authenticated Composio to connect to our Figma account. So, inside Claude Code, ask it to initiate a connection to the Figma MCP server, and it should give you a URL.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwaxdrt8g8z14aqd4iki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwaxdrt8g8z14aqd4iki.png" alt="Initiate Authentication with Figma MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Head over to that URL, and you should be authenticated like so:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm51c0yam7f97dtn0bk3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm51c0yam7f97dtn0bk3.png" alt="Composio Authentication screen"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And just like that, you're done! You can now clone any Figma design, no matter how complex it is!&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;💁 In this demo, I'll clone a sample CRM Dashboard design from Figma.&lt;/p&gt;

&lt;p&gt;All you need is the link to the Figma file. Just chat with Claude Code, then sit back and relax. Your clone will be ready in seconds.&lt;/p&gt;

&lt;p&gt;Prompt: I need you to clone the dashboard from this Figma design: . Use HTML, CSS, and JS. Make sure you clone the exact design. Don't show your creativity, make it exact.&lt;/p&gt;

&lt;p&gt;Here's the Figma template:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi9j2l68l0qbbx3gupedh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi9j2l68l0qbbx3gupedh.png" alt="Figma template"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's what it generated:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf80ic45m0fmpmg963h4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf80ic45m0fmpmg963h4.png" alt="Claude Code Output"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, it's almost an exact copy of the original design. You could ask it to build with Tailwind and any JS frameworks like Next.js, but for simplicity, I asked it to use plain HTML, CSS, and JS, and it did a pretty good job.&lt;/p&gt;

&lt;p&gt;Here’s the video demo:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/vq2vPY3E1Uo"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;You can find the entire code it generated here: Code for the Figma Dashboard.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;It's remarkable how quickly things are evolving with MCPs, coding agents, and LLMs. However, there are also emerging challenges, particularly in terms of security, availability, and reliability. Trusting random server providers without a proper safety net can be fatal. It's kinda what Composio  stands for. Check out the trust page for more.&lt;/p&gt;

&lt;p&gt;Additionally, if you ever build on top of us, please tag us on Twitter and LinkedIn for free credits.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>I vibe-coded a $20M YC app in a weekend, here's how🧙‍♂️ 🪄</title>
      <dc:creator>Sunil Kumar Dash</dc:creator>
      <pubDate>Mon, 02 Jun 2025 13:04:40 +0000</pubDate>
      <link>https://forem.com/composiodev/i-vibe-coded-a-20m-yc-app-in-a-weekend-heres-how-533o</link>
      <guid>https://forem.com/composiodev/i-vibe-coded-a-20m-yc-app-in-a-weekend-heres-how-533o</guid>
      <description>&lt;p&gt;I realised that many companies offer no-code platforms to their users for automating workflows.&lt;br&gt;
The numbers were kinda shocking.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1knu2pe7zanby8tsno8e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1knu2pe7zanby8tsno8e.png" alt="No-code platform statistics" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I spent a week deep-diving into Gumloop and other no-code platforms.&lt;br&gt;
They're well-designed, but here's the problem: they're not built for &lt;em&gt;agents&lt;/em&gt;. They're built for &lt;strong&gt;workflows&lt;/strong&gt;. There's a difference.&lt;/p&gt;

&lt;p&gt;Agents need customisation. They have to make decisions, route dynamically, and handle complex tool orchestration. Most platforms treat these as afterthoughts. I wanted to fix that.&lt;/p&gt;

&lt;p&gt;Although it's not production-ready and nowhere close to handling the requests of companies like Gumloop and similar ones, this is intended to showcase the robustness of Vibe coding and how easily you can build sophisticated apps in a matter of days. You can also carry forward the work to improve it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcej24uyfq10vasin9u5.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcej24uyfq10vasin9u5.gif" alt="Vibe coding is real" width="760" height="427"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Picking my tech stack
&lt;/h2&gt;

&lt;p&gt;NextJS was the obvious choice for the vibe-coding stack. Could I have used FastAPI with a React frontend?&lt;br&gt;
Sure — but just thinking about coordinating deployments, managing CORS, and syncing types made me tired.&lt;/p&gt;

&lt;p&gt;For adding a near-unlimited suite of SaaS app integrations, &lt;a href="https://composio.dev" rel="noopener noreferrer"&gt;Composio&lt;/a&gt; was the obvious choice. It features a JS SDK that enables you to add agent integrations easily.&lt;/p&gt;

&lt;p&gt;When it comes to agent frameworks, JS lacks the buffet Python has.&lt;/p&gt;

&lt;p&gt;It boiled down to two frameworks: &lt;a href="https://github.com/langchain-ai/langgraphjs" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt; and the &lt;a href="https://ai-sdk.dev/" rel="noopener noreferrer"&gt;AI SDK&lt;/a&gt; (I’d heard about &lt;a href="http://mastra.ai" rel="noopener noreferrer"&gt;Mastra AI&lt;/a&gt;, but I didn’t want to spend the weekend getting familiar with it).&lt;/p&gt;

&lt;p&gt;I chose &lt;strong&gt;LangGraph&lt;/strong&gt; over &lt;strong&gt;AI SDK&lt;/strong&gt; because LangGraph’s entire mental model is nodes and edges — exactly how visual agent builders should work. Every agent is just a graph; every workflow, a path through that graph. AI SDK is great, but not convenient for graph-based agents.&lt;/p&gt;


&lt;h2&gt;
  
  
  Coding with Vibes
&lt;/h2&gt;

&lt;p&gt;If you’re a vibe-code hater, skip ahead.&lt;br&gt;
Frontend is entirely vibe-coded. I didn’t use Lovable or &lt;a href="http://bolt.new/" rel="noopener noreferrer"&gt;Bolt.new&lt;/a&gt; because it’s easier to open the code in Cursor and tweak it there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My setup&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://composio.dev/blog/gpt-4-1-vs-deepseek-v3-vs-sonnet-3-7-vs-gpt-4-5/" rel="noopener noreferrer"&gt;GPT-4.1&lt;/a&gt;&lt;/strong&gt; – &lt;em&gt;The sniper&lt;/em&gt;: does exactly what you ask, nothing more, nothing less.
Great for precise component tweaks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 2.5 Pro&lt;/strong&gt; – &lt;em&gt;The machine-gun&lt;/em&gt;: rewrites entire components and understands context across files.
Perfect for major refactors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/21st-dev/magic-mcp" rel="noopener noreferrer"&gt;21st Dev’s MCP Server&lt;/a&gt;&lt;/strong&gt; – uses the Cursor Agent to build beautiful shadow components.
Instead of copy-pasting docs, I just describe what I want.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The canvas where users drag-and-drop nodes? Built with &lt;strong&gt;React Flow&lt;/strong&gt; plus a moving grid background from 21st Dev. Took ~30 minutes; doing it by hand would’ve exhausted me.&lt;/p&gt;
&lt;h2&gt;
  
  
  Building the Components
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr89y0z2jmgp2dv0i5jv0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr89y0z2jmgp2dv0i5jv0.png" alt="Agent builder nodes" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Strip away the marketing fluff; an AI agent is two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;An LLM that makes decisions&lt;/li&gt;
&lt;li&gt;The tools it can use to take action&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it. So I built exactly four fundamental nodes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input Node&lt;/strong&gt; – where data enters the system&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output Node&lt;/strong&gt; – where results emerge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM Node&lt;/strong&gt; – makes decisions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Node&lt;/strong&gt; – takes actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…and an &lt;strong&gt;Agent Node&lt;/strong&gt; that combines an LLM + Tools for convenience. Every complex workflow is just a remix of these primitives.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;a href="http://composio.dev" rel="noopener noreferrer"&gt;Composio&lt;/a&gt;  for adding unlimited tool integrations
&lt;/h3&gt;

&lt;p&gt;Writing tool integrations is painful. Managing auth for those tools? That’s where developers go to die.&lt;br&gt;
Every tool has a different auth flow. Multiply that by 100 + tools and you have a maintenance nightmare.&lt;/p&gt;

&lt;p&gt;Composio fixes this: one SDK, hundreds of pre-built tools, auth handled automatically. Ship in a weekend instead of spending months on OAuth.&lt;/p&gt;


&lt;h2&gt;
  
  
  API Routes
&lt;/h2&gt;

&lt;p&gt;Each workflow is a JSON graph. Here’s a tiny example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"nodes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"input_1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customInput"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User Query"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"edges"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"input_1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"target"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agent_1"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I wanted one API route that takes the entire graph and executes it.&lt;/p&gt;

&lt;p&gt;When a user hits &lt;strong&gt;Run&lt;/strong&gt;, this happens:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Graph Validation&lt;/strong&gt; – find the Input node, verify edges connect, check for cycles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Topological Sort&lt;/strong&gt; – determine execution order (LangGraph does this beautifully)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node Execution&lt;/strong&gt; – each node type has its own execution logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State Management&lt;/strong&gt; – pass data between nodes while maintaining context&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsp9ib4vcc957iyl9aooo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsp9ib4vcc957iyl9aooo.png" alt="Runtime workflow" width="800" height="480"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Sample snippet&lt;/span&gt;
&lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;llm&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getModelFromApiKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;result&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;previousOutput&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tool&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;composio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;result&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;previousOutput&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;agent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;composio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getTools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createReActAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;result&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;previousOutput&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;Managing Authentication with Tools&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ij9m8gpg1xjh1ntvlgl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ij9m8gpg1xjh1ntvlgl.png" alt="Auth workflow" width="800" height="479"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Authentication was my personal nightmare.&lt;br&gt;
&lt;a href="http://composio.dev" rel="noopener noreferrer"&gt;Composio&lt;/a&gt; solved the technical part, but the &lt;strong&gt;UX&lt;/strong&gt;? That took three rewrites.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v1 pain-stack&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Manually type action names (spelled perfectly)&lt;/li&gt;
&lt;li&gt;Leave my app to authenticate on Composio’s dashboard&lt;/li&gt;
&lt;li&gt;Come back and hope it worked&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I added a drop-down of actions, but auth was still clunky. So I:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pulled every available tool from Composio’s API and cached it locally.&lt;/li&gt;
&lt;li&gt;Built a modal showing each toolkit, its tools and connection status.&lt;/li&gt;
&lt;li&gt;Adapted the UI to the tool’s auth type:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API Keys&lt;/strong&gt; – password input + link to get the key&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OAuth2 (hosted)&lt;/strong&gt; – &lt;em&gt;Connect&lt;/em&gt; button opens a pop-up&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OAuth2 (custom)&lt;/strong&gt; – form for client credentials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Other&lt;/strong&gt; – dynamic form built from required fields&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once authenticated, the same modal lets you search and add tools in one click.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agent Orchestration Patterns
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx865jk2gdlncmplkbcim.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx865jk2gdlncmplkbcim.png" alt="Orchestration patterns" width="800" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Anthropic’s guide &lt;em&gt;“Building Effective Agents”&lt;/em&gt; lists several patterns. I created nodes that instantiate these instantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Prompt Chaining&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pattern:&lt;/strong&gt; Sequential; output of one agent feeds the next.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node example:&lt;/strong&gt;
&lt;code&gt;customInput → agent_1 → agent_2 → customOutput&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Parallelisation&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pattern:&lt;/strong&gt; Agents run in parallel and their results are aggregated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node example:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  customInput → agent_1   (parallel)
  customInput → agent_2   (parallel)
  both       → aggregator → customOutput
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;3. Routing&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pattern:&lt;/strong&gt; A router agent decides which branch to use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node example:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  customInput → router_agent
  router_agent → agent_1 | agent_2 → customOutput
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;4. Evaluator-Optimiser&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pattern:&lt;/strong&gt; Generator agent produces solutions; evaluator checks them; loop until good.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node example:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  customInput → generator_agent → solution_output
                 ↘ evaluator_agent ↗
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;5. Augmented LLM&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pattern:&lt;/strong&gt; An agent node is augmented with tool calls / external data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node example:&lt;/strong&gt;
&lt;code&gt;customInput → agent(with tools) → customOutput&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  After 48 hours of rapid development, I had a working agent platform.
&lt;/h2&gt;

&lt;p&gt;The barrier to building agents has collapsed. You don’t need a 20-person team and six months; you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear thinking about what agents are (decision-makers with tools)&lt;/li&gt;
&lt;li&gt;The right abstractions (everything is a graph)&lt;/li&gt;
&lt;li&gt;The wisdom to reuse existing solutions instead of rebuilding them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The irony? I spent more time perfecting the auth modal than building the execution engine. In the age of vibe-code, the hardest problems aren’t technical — they’re about understanding users and having the taste to build well.&lt;/p&gt;

&lt;p&gt;The code lives on &lt;strong&gt;&lt;a href="https://github.com/ComposioHQ/agent-flow" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt;. Fork it, break it, make it better.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Finally, the fruits of 48 hrs of vibe-coding:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;This was all about vibe coding my way to an actual product. Though it's still maybe not fully ready for the real world, it's 80% there in a weekend, which would have taken months before.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>javascript</category>
      <category>ai</category>
    </item>
    <item>
      <title>Top 10 awesome MCP servers that can make your life easier 🪄✨</title>
      <dc:creator>Sunil Kumar Dash</dc:creator>
      <pubDate>Thu, 24 Apr 2025 13:54:36 +0000</pubDate>
      <link>https://forem.com/composiodev/top-10-awesome-mcp-servers-that-can-make-your-life-easier-3n4o</link>
      <guid>https://forem.com/composiodev/top-10-awesome-mcp-servers-that-can-make-your-life-easier-3n4o</guid>
      <description>&lt;p&gt;MCP by Anthropic is the talk of the town; it's the one thing everyone is talking about and building around. Why, you may ask? Well, the simple reason is that the tooling layer in agents has always been the most challenging part to solve. The MCP (Model Context Protocol) standardises how developers should build tools and clients for universal adaptability.&lt;/p&gt;

&lt;p&gt;Recently, both OpenAI and Google have officially started supporting MCP in their respective agent frameworks, Agentsdk and Agent Development Kit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67vizeerdna9p7m1k66u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67vizeerdna9p7m1k66u.png" alt="Demis Hassabis on MCP tweet"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This blog post discusses some of the best MCP servers I have tried to improve my productivity over the last two months. But before that, let's go over what MCP even is.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is MCP (Model Context Protocol), and why should you care?
&lt;/h2&gt;

&lt;p&gt;It’s an open standard developed by Anthropic that standardises how AI applications, LLMs, and tools communicate. It has three distinct components.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Host:&lt;/strong&gt; Applications like Cursor, Windsurf, Claude Desktop, etc.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client:&lt;/strong&gt; Manages the communication between the host application and servers—the middleman.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server:&lt;/strong&gt; Servers are tools (File, Git, Shell, Slack, Notion APIs), databases, log files, etc, which can provide additional context to agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anthropic defines MCP as the USB-C equivalent of agentic systems. The computers are the hosts, clients are the ports, and peripheral devices are the servers.&lt;/p&gt;

&lt;p&gt;For a more detailed explanation of MCP, check out this blog post: &lt;a href="https://composio.dev/blog/what-is-model-context-protocol-mcp-explained/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What are MCP Servers?
&lt;/h2&gt;

&lt;p&gt;MCP servers expose external data to the LLM. They can be local tools like the File System tool or remote API services like Slack, Discord, etc. Servers allow your AI apps to be genuinely agentic. &lt;/p&gt;

&lt;p&gt;This post will discuss 10 MCP servers that have helped me save hours. &lt;/p&gt;

&lt;p&gt;So, let's get started.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. &lt;a href="https://mcp.composio.dev/notion" rel="noopener noreferrer"&gt;Notion&lt;/a&gt; for automated note-taking
&lt;/h2&gt;

&lt;p&gt;One of the best productivity hacks for me has been the Notion MCP server.  I use Notion to store all the details from my conversations in the Claude app. It can also fetch any document from Notion and add it as additional context to the discussion. I have been using it with Cursor and Claude Desktop, and it’s so good.&lt;/p&gt;

&lt;p&gt;For Cursor, I use it to fetch the product requirement document and have it create features accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How to use Notion MCP in the Claude server&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; First, make sure Node.js is installed, and run &lt;code&gt;node -v&lt;/code&gt; In your terminal
&lt;/li&gt;
&lt;li&gt; Else, install it from &lt;a href="http://nodejs.org" rel="noopener noreferrer"&gt;nodejs.org&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To get Notion MCP, go to the &lt;a href="https://mcp.composio.dev" rel="noopener noreferrer"&gt;https://mcp.composio.dev&lt;/a&gt; and search Notion. They also handle the OAuth authentication, so you can securely connect with the Notion app without worrying about authentication and authorisation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6f7zrwo2q1c7rhxsicqp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6f7zrwo2q1c7rhxsicqp.png" alt="Notion MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will get a &lt;code&gt;npx&lt;/code&gt; command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @composio/mcp@latest setup &lt;span class="s2"&gt;"replace it with the URL"&lt;/span&gt; &lt;span class="nt"&gt;--client&lt;/span&gt; claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, paste the generated code into your terminal and execute it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; The code will automatically add the Notion MCP to your Claude desktop.
&lt;/li&gt;
&lt;li&gt; Refresh or restart the app; you will see a hammer icon in Claude's chat.
&lt;/li&gt;
&lt;li&gt; &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqht3uk02hso5d8sfri9w.png" alt="Claude MCP"&gt;
&lt;/li&gt;
&lt;li&gt; Click on it to see the available actions.
&lt;/li&gt;
&lt;li&gt;Start by asking in the chat to “Initiate connection with Notion.”
&lt;/li&gt;
&lt;li&gt; Complete the Auth flow and start asking questions.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For Windsurf and Cursor, you can also follow the &lt;a href="https://mcp.composio.dev/notion/grumpy-spoiled-horse-QOMA0z" rel="noopener noreferrer"&gt;instructions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Check out this tutorial on how to integrate Notion with Claude Desktop.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/Bc9jS8iZQY0"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;




&lt;h2&gt;
  
  
  2. &lt;a href="https://mcp.composio.dev/figma" rel="noopener noreferrer"&gt;Figma&lt;/a&gt;: From Design to Code
&lt;/h2&gt;

&lt;p&gt;You’ll thank the Lord after using Figma MCP in your Cursor workflow. You can code any Figma design files. It will definitely make your life easier as a developer.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to use Figma MCP in Cursor
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyipjxqt57mgwpg5mefb6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyipjxqt57mgwpg5mefb6.png" alt="Figma MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Follow the same steps above and make sure your system has Node.js installed.
&lt;/li&gt;
&lt;li&gt; Go to &lt;a href="http://mcp.composio.dev/figma" rel="noopener noreferrer"&gt;http://mcp.composio.dev/figma&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; Generate the &lt;code&gt;npx&lt;/code&gt; code.
&lt;/li&gt;
&lt;li&gt; Run it in your terminal.
&lt;/li&gt;
&lt;li&gt; Now, re-open the Cursor or refresh it.
&lt;/li&gt;
&lt;li&gt; You can now see your Figma tools in Cursor settings → MCP.
&lt;/li&gt;
&lt;li&gt; &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5y04th7rzxr1ho6f03g.png" alt="Cursor MCP"&gt; &lt;/li&gt;
&lt;li&gt; Now, initiate a connection with Figma by asking in the chat.
&lt;/li&gt;
&lt;li&gt; Give it the URL to your file in the Figma Project.
&lt;/li&gt;
&lt;li&gt; Now ask it to write code from the design.
&lt;/li&gt;
&lt;li&gt; The Cursor agent writes the code.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi87tw1uuak2775loxk0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi87tw1uuak2775loxk0.png" alt="Figma MCP"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  3. &lt;a href="https://mcp.composio.dev/supabase" rel="noopener noreferrer"&gt;Supabase&lt;/a&gt; for managing the database from an IDE
&lt;/h2&gt;

&lt;p&gt;This is yet another popular use case of MCP servers. You can connect Cursor, Windsurf, or Claude Desktop with your Supabase database.&lt;/p&gt;

&lt;h3&gt;
  
  
  What can you do with it?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema Exploration and Documentation:&lt;/strong&gt; Use the MCP server to read and explain your table structures, relationships, and constraints in plain language.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read-Only Queries for Insights:&lt;/strong&gt; Let the MCP generate SQL SELECT statements to retrieve and summarise data for quick analysis.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explain and Debug Queries:&lt;/strong&gt; Ask the MCP to interpret or optimise your existing SQL queries and outline the query execution plan in simpler terms.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate Migrations in a Dev/Staging Environment:&lt;/strong&gt; Have the MCP propose schema changes, then review and apply them in a safe environment before production.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to use Supabase MCP
&lt;/h3&gt;

&lt;p&gt;For a managed Supabase server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to &lt;a href="https://mcp.composio.dev/supabase" rel="noopener noreferrer"&gt;https://mcp.composio.dev/supabase&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Get the &lt;code&gt;npx&lt;/code&gt; Command and run it in your terminal
&lt;/li&gt;
&lt;li&gt;Refresh your MCP-compatible host
&lt;/li&gt;
&lt;li&gt;Initiate a new connection
&lt;/li&gt;
&lt;li&gt;And start using it
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgl2fiakkjjtdgzsz49lv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgl2fiakkjjtdgzsz49lv.png" alt="Supabase MCP"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  4. &lt;a href="https://mcp.composio.dev/firecrawl" rel="noopener noreferrer"&gt;Firecrawl MCP&lt;/a&gt; for web-crawling
&lt;/h2&gt;

&lt;p&gt;It doesn’t matter if you’re a technical or non-technical person; this can be a great boon in your productivity. Firecrawl is a tool that can help you navigate websites and get content for you. With a Firecrawl MCP in your Chat app, you can search websites and ask for any information.&lt;/p&gt;

&lt;h3&gt;
  
  
  What can you do with it?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Collect and summarise content from any website or blog across multiple pages.
&lt;/li&gt;
&lt;li&gt;Gather competitor research data (e.g., product pricing, feature comparisons, or marketing strategies).
&lt;/li&gt;
&lt;li&gt;Combine web-crawled material with other data sources (e.g., local files or databases) for more profound insights or reports.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to use the FireCrawl MCP server with Composio
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Go to &lt;a href="https://mcp.composio.dev/firecrawl" rel="noopener noreferrer"&gt;https://mcp.composio.dev/firecrawl&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Get the &lt;code&gt;npx&lt;/code&gt; Command and run it in your terminal
&lt;/li&gt;
&lt;li&gt;Refresh your MCP-compatible host
&lt;/li&gt;
&lt;li&gt;Initiate a new connection
&lt;/li&gt;
&lt;li&gt;And start using it
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxlg3e1phxpza8loonk7n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxlg3e1phxpza8loonk7n.png" alt="FireCrawl MCP"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  5. &lt;a href="https://github.com/modelcontextprotocol/servers/tree/main/src/memory" rel="noopener noreferrer"&gt;Memory MCP Server&lt;/a&gt;: Persistent memory across chat
&lt;/h2&gt;

&lt;p&gt;If you use Claude a lot, you’d know how irritating it can be sometimes to switch to a different chat window and start the conversation from scratch. Well, memory servers ease this.&lt;/p&gt;

&lt;p&gt;This Knowledge Graph Memory Server tool allows Claude to maintain persistent memory across user conversations. It essentially creates a database of user information that can be accessed and updated over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to use the Memory Graph MCP server
&lt;/h3&gt;

&lt;p&gt;You can use this server with Claude. Here’s how you can do it. Go to Claude Desktop → Settings → Developer → Edit Config&lt;/p&gt;

&lt;p&gt;Open the &lt;code&gt;claude_desktop_config.json&lt;/code&gt; for &lt;code&gt;npx&lt;/code&gt; Based on servers. You'd need Node.js for it to work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"memory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"@modelcontextprotocol/server-memory"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is also configurable with environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"memory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"@modelcontextprotocol/server-memory"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"MEMORY_FILE_PATH"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/custom/memory.json"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  6. &lt;a href="https://github.com/ahujasid/blender-mcp" rel="noopener noreferrer"&gt;Blender MCP&lt;/a&gt;: For 3d modelling, Scene creation, and manipulation
&lt;/h2&gt;

&lt;p&gt;Blender MCP is the hottest thing right now. You can connect Claude AI to this and interactively build 3d renders just by prompting it.&lt;/p&gt;

&lt;p&gt;Here are some features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Two-way communication:&lt;/strong&gt; Establishes a direct connection between Claude AI and Blender through a socket-based server
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Object manipulation:&lt;/strong&gt; Let Claude create, modify, and delete 3d objects directly in your Blender scenes
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Material control:&lt;/strong&gt; Enables Claude to apply and modify materials and colours to objects in your projects
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scene inspection:&lt;/strong&gt; Allows Claude to analyse and retrieve detailed information about your current Blender scene
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code execution:&lt;/strong&gt; Empowers Claude to run Python code in Blender, opening up endless customisation possibilities
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pi8ctfp6usfhsnbijij.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pi8ctfp6usfhsnbijij.png" alt="Blender MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How to integrate Blender MCP into Claude
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Blender 3.0 or newer
&lt;/li&gt;
&lt;li&gt;Python 3.10 or newer
&lt;/li&gt;
&lt;li&gt;uv package manager:
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;If you're on Mac, install uv:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;uv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;On Windows&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;powershell&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"irm &amp;lt;https://astral.sh/uv/install.ps1&amp;gt; | iex"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and then&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;set &lt;/span&gt;&lt;span class="nv"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;C:&lt;span class="se"&gt;\\&lt;/span&gt;Users&lt;span class="se"&gt;\\&lt;/span&gt;nntra&lt;span class="se"&gt;\\&lt;/span&gt;.local&lt;span class="se"&gt;\\&lt;/span&gt;bin&lt;span class="p"&gt;;&lt;/span&gt;%Path%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Claude for Desktop Integration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Go to Claude &amp;gt; Settings &amp;gt; Developer &amp;gt; Edit Config &amp;gt; &lt;code&gt;claude_desktop_config.json&lt;/code&gt; and include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"blender"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"blender-mcp"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cursor integration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Run blender-mcp without installing it permanently through uvx. Go to Cursor Settings &amp;gt; MCP and paste this as a command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvx blender-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Windows users, go to Settings &amp;gt; MCP &amp;gt; Add Server, add a new server with the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"blender"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cmd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"/c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"blender-mcp"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  7. &lt;a href="https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem" rel="noopener noreferrer"&gt;File Search&lt;/a&gt;: Working with files from the MCP hosts
&lt;/h2&gt;

&lt;p&gt;A local tool that will let you work with file systems from the Claude Desktop. You can get any files from your disk, feed them to Claude or Curasor, and work however you want with them.&lt;/p&gt;

&lt;p&gt;Here are some features&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read/write files
&lt;/li&gt;
&lt;li&gt;Create/list/delete directories
&lt;/li&gt;
&lt;li&gt;Move files/directories
&lt;/li&gt;
&lt;li&gt;Search files
&lt;/li&gt;
&lt;li&gt;Get file metadata
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The server will only allow operations within directories specified via &lt;code&gt;args&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to add a File Search MCP server
&lt;/h3&gt;

&lt;p&gt;Add this to &lt;code&gt;claude_desktop_config.json&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"filesystem"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"@modelcontextprotocol/server-filesystem"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/Users/username/Desktop"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/other/allowed/dir"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  8. &lt;a href="https://github.com/smithery-ai/mcp-obsidian" rel="noopener noreferrer"&gt;Obsidian MCP Server&lt;/a&gt;: Note-taking meets AI
&lt;/h2&gt;

&lt;p&gt;If you’re an Obsidian user and use it frequently, you should have it in your Claude. You can&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Access your knowledge base:&lt;/strong&gt; Claude can directly search, read, and reference all your Obsidian notes.
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Create and modify notes:&lt;/strong&gt; Ask Claude to draft or update notes in your vault.
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Query across documents:&lt;/strong&gt; Find connections between ideas across your entire knowledge system
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Extract insights:&lt;/strong&gt; Have Claude analyse patterns and relationships in your notes
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to add the Obsidian MCP server
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; Check if you have Node.js installed and install it if it’s not there.
&lt;/li&gt;
&lt;li&gt; And run the command in the terminal:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @smithery/cli &lt;span class="nb"&gt;install &lt;/span&gt;mcp-obsidian &lt;span class="nt"&gt;--client&lt;/span&gt; claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  9. &lt;a href="https://mcp.composio.dev/linear" rel="noopener noreferrer"&gt;Linear MCP Server&lt;/a&gt;: For ticket management
&lt;/h2&gt;

&lt;p&gt;If you're managing projects with Linear, connecting it to Claude unlocks powerful capabilities. You can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Issue management:&lt;/strong&gt; Create, update, and close tickets directly through conversations
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Project tracking:&lt;/strong&gt; Get status updates and summaries across your entire workspace
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Sprint planning:&lt;/strong&gt; Generate sprint plans based on backlog analysis
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Priority management:&lt;/strong&gt; Reorganise and prioritise issues through natural language
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to add the Obsidian MCP server
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; Go to &lt;a href="https://mcp.composio.dev/linear" rel="noopener noreferrer"&gt;https://mcp.composio.dev/linear&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; Generate the &lt;code&gt;npx&lt;/code&gt; Command for Cursor
&lt;/li&gt;
&lt;li&gt; Run it in your terminal
&lt;/li&gt;
&lt;li&gt; Initiate a new connection.
&lt;/li&gt;
&lt;li&gt; Start working
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkhcw345a29hpgv3lg1ir.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkhcw345a29hpgv3lg1ir.png" alt="Linear MCP"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  10. &lt;a href="https://mcp.composio.dev/github" rel="noopener noreferrer"&gt;Github&lt;/a&gt;: Working with your remote repository
&lt;/h2&gt;

&lt;p&gt;Connecting GitHub to Claude transforms your development workflow. You can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Code review:&lt;/strong&gt; Have Claude analyse pull requests and suggest improvements
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Commit management:&lt;/strong&gt; Search, analyse and create commits through conversation
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Issue tracking:&lt;/strong&gt; Create, update and resolve GitHub issues
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Repository exploration:&lt;/strong&gt; Navigate codebases and understand project structures
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to add the GitHub MCP server
&lt;/h3&gt;

&lt;p&gt;Same as before&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to &lt;a href="https://mcp.composio.dev/github" rel="noopener noreferrer"&gt;https://mcp.composio.dev/github&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Generate the &lt;code&gt;npx&lt;/code&gt; The command for Cursor/Claude
&lt;/li&gt;
&lt;li&gt;Run it in your terminal
&lt;/li&gt;
&lt;li&gt;Initiate a new connection with GitHub
&lt;/li&gt;
&lt;li&gt;Start working
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzx81cda7pep7ob5ods1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzx81cda7pep7ob5ods1.png" alt="GitHub MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For a complete list of managed &lt;a href="https://mcp.composio.dev" rel="noopener noreferrer"&gt;MCP servers&lt;/a&gt;, check out Composio. There, you will find MCP servers for mainstream application services and niche apps, which you will not find anywhere else.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>javascript</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>MCP vs Agent2Agent: Everything you need to know</title>
      <dc:creator>Sunil Kumar Dash</dc:creator>
      <pubDate>Wed, 23 Apr 2025 14:34:49 +0000</pubDate>
      <link>https://forem.com/composiodev/mcp-vs-agent2agent-everything-you-need-to-know-52ck</link>
      <guid>https://forem.com/composiodev/mcp-vs-agent2agent-everything-you-need-to-know-52ck</guid>
      <description>&lt;p&gt;Am I a bit late to talk about MCP and A2A protocols? I hope not! Both have been all over the internet, and they are mind-blowing!! A race is occurring right now, and nobody wants to be left behind in introducing new models and tools.  &lt;/p&gt;

&lt;p&gt;Anthropic released MCP (Model Context Protocol) for agents, which got good community traction. Recently, we saw Openai’s integration with MCP. MCP tells you how the agent will communicate with the APIs, making our multiple-tool calling easier.  &lt;/p&gt;

&lt;p&gt;Now, Google has released an A2A (Agent2Agent) protocol to streamline agent communication. In short, A2A standardises agent-to-agent communication while MCP standardises agent-to-tools communication.  &lt;/p&gt;

&lt;p&gt;So, yes, they are not competing but &lt;strong&gt;complementing&lt;/strong&gt; each other. Google has extended support for MCP in the &lt;a href="https://google.github.io/adk-docs/tools/mcp-tools/" rel="noopener noreferrer"&gt;Agents Development Kit (ADK)&lt;/a&gt;.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiq5gffn8vpz89noe2dhn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiq5gffn8vpz89noe2dhn.png" alt="Screenshot 2025-04-21" width="800" height="153"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;This blog post explains how they work together to standardise building production-ready AI agents.  &lt;/p&gt;

&lt;p&gt;Let’s first discuss MCP and then proceed with the A2A protocol to see how both work.  &lt;/p&gt;




&lt;h2&gt;
  
  
  Understanding MCPs – The Role of the Model Context Protocol
&lt;/h2&gt;

&lt;p&gt;MCP stands for &lt;strong&gt;Model Context Protocol&lt;/strong&gt;, an open standard developed by Anthropic. It defines a structured and efficient way for applications to provide external context to large language models (LLMs) like Claude and GPT. Think of it like USB for AI — it lets AI models connect to external tools and data sources in a standardised way.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXe37yE9stEy71LOC4sJfdNWFK9iJ0kDIjyCQ8sq6eWrybZhCUze4qBaDDtqKz_zX68L6JsDXb7H0LRsSoq4OE58pC3mSYzBSTvRMN4M5DrVKC1gGRgLQC2nXtx4cWoKU98NVXLmRw%3Fkey%3Dz8PBt65nApq2h4WBsLfKiXrI" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXe37yE9stEy71LOC4sJfdNWFK9iJ0kDIjyCQ8sq6eWrybZhCUze4qBaDDtqKz_zX68L6JsDXb7H0LRsSoq4OE58pC3mSYzBSTvRMN4M5DrVKC1gGRgLQC2nXtx4cWoKU98NVXLmRw%3Fkey%3Dz8PBt65nApq2h4WBsLfKiXrI" alt="MCP diagram" width="800" height="400"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What’s the Core Problem MCP Solves?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;MCP has three critical components:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Client:&lt;/strong&gt; Maintains a 1-to-1 connection with servers, handles all LLM routing and orchestration, and negotiates capabilities.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server:&lt;/strong&gt; API services, databases, and logs that LLMs can access to complete tasks. Servers expose tools that LLMs use.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Protocol:&lt;/strong&gt; The core standard governing client and server communication.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For an in-depth guide on MCP, its architecture, and internal workings, see &lt;strong&gt;&lt;a href="https://composio.dev/blog/what-is-model-context-protocol-mcp-explained/" rel="noopener noreferrer"&gt;Model Context Protocol (MCP): Explained&lt;/a&gt;&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXfy5syKIV8gC3woAOLyg4GzoFtzbrBO95g5mL4Oxs1ZGwoA6HaOIz2fy-R9RzTT-5ClPqyLMAJF98Ik0M0pb7X4Qnkjf7Bj_LDU5JADfR5jga3xYSSBkxwLZr46BS0N-GjEuRdr8g%3Fkey%3Dz8PBt65nApq2h4WBsLfKiXrI" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXfy5syKIV8gC3woAOLyg4GzoFtzbrBO95g5mL4Oxs1ZGwoA6HaOIz2fy-R9RzTT-5ClPqyLMAJF98Ik0M0pb7X4Qnkjf7Bj_LDU5JADfR5jga3xYSSBkxwLZr46BS0N-GjEuRdr8g%3Fkey%3Dz8PBt65nApq2h4WBsLfKiXrI" alt="MCP architecture" width="800" height="400"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;In short, MCP lets client developers build apps (Cursor, Windsurf, etc.) and server developers build API servers without worrying about each other’s implementation. Any MCP client can connect to any MCP server and vice-versa.  &lt;/p&gt;

&lt;p&gt;Each tool implementation is different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Different field names (&lt;code&gt;start_time&lt;/code&gt; vs &lt;code&gt;event_time&lt;/code&gt;)
&lt;/li&gt;
&lt;li&gt;Different auth schemes (OAuth, API key, JWT, etc.)
&lt;/li&gt;
&lt;li&gt;Different error formats
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MCP &lt;strong&gt;standardises&lt;/strong&gt; how servers are built. You still write integration logic for each app (or use &lt;a href="https://composio.dev" rel="noopener noreferrer"&gt;Composio&lt;/a&gt;), but MCP ensures any server can plug into any client. That makes life easier for millions of developers and abstracts away tool-by-tool quirks.  &lt;/p&gt;

&lt;p&gt;You can think of it like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;User:&lt;/strong&gt; Adds a &lt;a href="https://mcp.composio.dev/googlecalendar/tinkling-faint-car-f6g1zk" rel="noopener noreferrer"&gt;Google Calendar MCP server&lt;/a&gt; to Cursor IDE.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client:&lt;/strong&gt; Fetches the server’s tools and injects them into the LLM context.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User:&lt;/strong&gt; “Schedule a team sync on Thursday at 3 PM.”
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Client:&lt;/strong&gt; LLM decides it needs to call a tool, fills parameters, and executes (after auth).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calendar Server:&lt;/strong&gt; Creates the meeting.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of wiring services with brittle code, you now get a &lt;strong&gt;clean, modular interface&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;Despite its merits, MCP can be tough in production — security, reliability, and multiple servers get hairy. That’s why we at &lt;strong&gt;&lt;a href="https://mcp.composio.dev" rel="noopener noreferrer"&gt;Composio&lt;/a&gt;&lt;/strong&gt; are building robust MCP infrastructure for your AI workflows.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Agent2Agent Protocol by Google
&lt;/h2&gt;

&lt;p&gt;Google introduced the &lt;strong&gt;Agent-to-Agent Protocol (A2A)&lt;/strong&gt;, inspired by MCP. Where MCP focuses on agent-to-server calls, A2A focuses on agent-to-agent interoperability.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwa4frxbknjsb9a5co0c3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwa4frxbknjsb9a5co0c3.png" alt="A2A diagram" width="800" height="471"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;Imagine a travel assistant planning a trip from Delhi to Mumbai. It can delegate to a train-booking agent, a hotel-booking agent, and a cab-service agent:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Plan a full trip from Delhi to Mumbai, book my train, find a hotel near the station, and arrange local transport.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Behind the scenes A2A forms a mini-team of agents, each handling part of the job. That’s modular, connected, and smarter.  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;A2A Design Principles&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A2A enables flexible, secure communication between autonomous agents, regardless of vendor or ecosystem.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A2A in a nutshell&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key advantages&lt;/strong&gt;  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;True agentic behaviour&lt;/strong&gt; – independent cooperation without shared state.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Familiar tech stack&lt;/strong&gt; – HTTP + SSE + JSON-RPC.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise-grade security&lt;/strong&gt; – built-in auth/authz like OpenAPI.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Short- or long-running tasks&lt;/strong&gt; – real-time progress, state tracking, and human-in-the-loop.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modality-agnostic&lt;/strong&gt; – text, audio, video, etc.
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  How A2A Works
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;What happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Capability discovery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Agents publish &lt;em&gt;Agent Cards&lt;/em&gt; (JSON) describing skills, modalities, constraints.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Task lifecycle&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A client agent delegates a &lt;em&gt;task&lt;/em&gt;; the remote agent updates status until producing an &lt;em&gt;artefact&lt;/em&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Collaboration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Agents exchange messages, artefacts, and context.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;UX negotiation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Messages use typed &lt;em&gt;parts&lt;/em&gt; (text, image, chart, form, …) tailored to the client’s UI.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXdu4KSAFJ_4N9L4tU3xOzF8OgC9vu5I9ktJMElU8cXZUfORUjr7QEJtNIMgmoNiGUAYGZwEUDBGbNqaK3L_pdfsVfdZx-8kia3RVrFfdkeLRLqTHLfI9VlEPBvLV26uTgOHEY1D0w%3Fkey%3Dz8PBt65nApq2h4WBsLfKiXrI" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh7-rt.googleusercontent.com%2Fdocsz%2FAD_4nXdu4KSAFJ_4N9L4tU3xOzF8OgC9vu5I9ktJMElU8cXZUfORUjr7QEJtNIMgmoNiGUAYGZwEUDBGbNqaK3L_pdfsVfdZx-8kia3RVrFfdkeLRLqTHLfI9VlEPBvLV26uTgOHEY1D0w%3Fkey%3Dz8PBt65nApq2h4WBsLfKiXrI" alt="A2A lifecycle" width="800" height="400"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Key Concepts of A2A Protocol&lt;/strong&gt;
&lt;/h3&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Multi-Agent Collaboration&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Agents share tasks, results, and work across ecosystems.
&lt;/li&gt;
&lt;li&gt;E.g. a recruiting agent chatting with a company’s hiring agent, or a delivery agent coordinating restaurants.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Open &amp;amp; Extensible&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Open protocol with 50 + contributors (Atlassian, Box, LangChain, PayPal, etc.).
&lt;/li&gt;
&lt;li&gt;Uses standards like &lt;strong&gt;JSON-RPC&lt;/strong&gt; and Service/Event descriptions.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Secure by Default&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Auth / authz via OpenID Connect.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.well-known/agent.json&lt;/code&gt; discovery endpoints.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Working of A2A – Examples&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Architecture Example&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Three agents in a productivity suite:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Calendar Agent&lt;/strong&gt; – hosted server, pulls availability via MCP.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document Agent&lt;/strong&gt; – fetches documents/notes via MCP.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assistant Agent&lt;/strong&gt; – user-facing LLM delegating tasks.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Flow&lt;/strong&gt;  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Assistant → Calendar: check availability.
&lt;/li&gt;
&lt;li&gt;Assistant → Document: fetch &amp;amp; summarise doc.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So &lt;strong&gt;A2A&lt;/strong&gt; handles agent-to-agent chat, while &lt;strong&gt;MCP&lt;/strong&gt; bridges agents to apps.  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Agent Discovery (Inspired by OpenID Connect)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Agents advertise at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;yourdomain.com/.well-known/agent.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It lists name, description, capabilities, sample queries, modalities, etc., so newcomers can discover and interact dynamically.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Agent2Agent vs MCP
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;MCP&lt;/th&gt;
&lt;th&gt;Agent2Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Communication&lt;/td&gt;
&lt;td&gt;Agent ↔ External APIs&lt;/td&gt;
&lt;td&gt;Agent ↔ Agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Goal&lt;/td&gt;
&lt;td&gt;API integration&lt;/td&gt;
&lt;td&gt;Collaboration &amp;amp; interoperability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layer&lt;/td&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;Mid-layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tech&lt;/td&gt;
&lt;td&gt;REST/JSON&lt;/td&gt;
&lt;td&gt;JSON-RPC / events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inspired by&lt;/td&gt;
&lt;td&gt;LSP&lt;/td&gt;
&lt;td&gt;OpenID Connect&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;MCP provides the tools each agent uses, while A2A facilitates the collaboration between agents. They complement each other, ensuring both the execution of individual tasks and the coordination of complex, multi-step processes.​&lt;/p&gt;

&lt;p&gt;While MCP equips agents with the necessary tools to perform specific tasks, A2A enables these agents to collaborate, ensuring a cohesive and efficient experience.&lt;/p&gt;

&lt;p&gt;Both Anthropic’s MCP and Google’s A2A protocols facilitate interaction between AI systems and external components, but they cater to different scenarios and architectures.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Anthropic MCP&lt;/th&gt;
&lt;th&gt;Google A2A&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Objective&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Link one model to external tools&lt;/td&gt;
&lt;td&gt;Coordinate autonomous agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best fit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Secure enterprise data access&lt;/td&gt;
&lt;td&gt;Distributed B2B coordination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Protocol&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;STDIO/HTTP + SSE&lt;/td&gt;
&lt;td&gt;HTTP/S + webhooks/SSE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Discovery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual server settings&lt;/td&gt;
&lt;td&gt;Dynamic &lt;em&gt;Agent Cards&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pattern&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Top-down calls&lt;/td&gt;
&lt;td&gt;Peer collaboration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cross-boundary focus&lt;/td&gt;
&lt;td&gt;Same, multi-agent scope&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Workflows&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple request-response&lt;/td&gt;
&lt;td&gt;Long-running, stateful&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  1. Communication
&lt;/h3&gt;

&lt;p&gt;MCP: Structured Schemas&lt;br&gt;
• In MCP (Multi-Call Protocol), the interaction is explicit and schema-driven.&lt;br&gt;
• The assistant knows exactly what tool to call, what arguments to pass, and in what format.&lt;br&gt;
• Flow: AI Assistant → Tool with structured input → Tool returns raw result.&lt;br&gt;
MCP Flow:&lt;/p&gt;

&lt;p&gt;• AI sends: get_weather_forecast(Tokyo, 2025-04-22)&lt;br&gt;
• Tool returns: “Sunny, 22°C”&lt;br&gt;
• AI just displays the result.&lt;br&gt;
A2A: Natural Language &lt;br&gt;
• A2A (Agent-to-Agent) is much more conversation-style, using natural language tasks.&lt;br&gt;
• Tasks are expressed like real user queries, and agents internally decide how to interpret them.&lt;br&gt;
• Flow: User Agent → Task in plain English → Target Agent processes → Responds naturally.&lt;br&gt;
A2A Flow:&lt;/p&gt;

&lt;p&gt;• User says: “Can you tell me the weather in Tokyo on April 22nd and the current $NVDA price?”&lt;br&gt;
• Agent routes to the appropriate Finance/Weather Agent&lt;br&gt;
• Response might be: “Sure! The forecast for Tokyo on April 22nd is sunny with a high of 22°C. or $NVDA price currently is $101.42 down by 0.064%”&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Task Management
&lt;/h3&gt;

&lt;p&gt;MCP: Single-Stage Execution&lt;br&gt;
• MCP handles tasks like a classic function call.&lt;br&gt;
• You call the function (or “tool”) and immediately get a response: either a success with the result or a failure (error/exception).&lt;br&gt;
• The whole process is immediate and atomic, one shot, one answer.&lt;br&gt;
A2A: Multi-Stage Lifecycle&lt;br&gt;
• A2A treats tasks like long-running jobs.&lt;br&gt;
• Tasks have multiple possible states:&lt;br&gt;
• pending → waiting to start&lt;br&gt;
• running → work in progress (can even provide partial results!)&lt;br&gt;
• completed → final result ready&lt;br&gt;
• failed → something went wrong&lt;br&gt;
• You can check back anytime to see progress, grab partial data, or wait for the full result.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Capability Specification
&lt;/h3&gt;

&lt;p&gt;MCP: Low-Level, Instruction-Based&lt;br&gt;
MCP capabilities are described with very strict schemas, usually in JSON Schema format. They are about precision and control, like telling a machine exactly what to do and how to do it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"book_table"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Books a table at a restaurant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="nl"&gt;"inputSchema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;

    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;

      &lt;/span&gt;&lt;span class="nl"&gt;"restaurant"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;

      &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"date"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;

      &lt;/span&gt;&lt;span class="nl"&gt;"time"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"pattern"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{2}:&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{2}$"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;

      &lt;/span&gt;&lt;span class="nl"&gt;"party_size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minimum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;

    &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"restaurant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"time"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"party_size"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A2A: High-Level, Goal-Oriented
&lt;/h3&gt;

&lt;p&gt;In contrast, A2A uses an Agent Card to describe capabilities regarding goals, roles, and expertise. It’s like explaining what someone is good at and trusting them to handle it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;agent_card&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;AgentCard(&lt;/span&gt;&lt;span class="w"&gt;

    &lt;/span&gt;&lt;span class="err"&gt;id=&lt;/span&gt;&lt;span class="s2"&gt;"restaurant-agent"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

    &lt;/span&gt;&lt;span class="err"&gt;name=&lt;/span&gt;&lt;span class="s2"&gt;"Dining Assistant"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

    &lt;/span&gt;&lt;span class="err"&gt;description=&lt;/span&gt;&lt;span class="s2"&gt;"Helps users find and book tables at restaurants."&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

    &lt;/span&gt;&lt;span class="err"&gt;agent_skills=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;

        &lt;/span&gt;&lt;span class="err"&gt;AgentSkill(&lt;/span&gt;&lt;span class="w"&gt;

            &lt;/span&gt;&lt;span class="err"&gt;id=&lt;/span&gt;&lt;span class="s2"&gt;"table_booking"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

            &lt;/span&gt;&lt;span class="err"&gt;name=&lt;/span&gt;&lt;span class="s2"&gt;"Table Booking"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

            &lt;/span&gt;&lt;span class="err"&gt;description=&lt;/span&gt;&lt;span class="s2"&gt;"Can search restaurants and book tables as per user preferences."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

            &lt;/span&gt;&lt;span class="err"&gt;examples=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;

                &lt;/span&gt;&lt;span class="s2"&gt;"Book a table for 4 at an Italian place this Friday night."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

                &lt;/span&gt;&lt;span class="s2"&gt;"Find a quiet restaurant near downtown and reserve for two people."&lt;/span&gt;&lt;span class="w"&gt;

            &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

        &lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;• MCP allows you add skills (API services, Databases, records, etc) to your agents.&lt;br&gt;
• A2A gives you flexibility, judgment, and delegation power. Think of a team of thoughtful coworkers.&lt;br&gt;
• They’re like pairing an engineer (MCP) with a project manager (A2A). One does exact work; the other handles the chaos.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Use MCP with A2A
&lt;/h2&gt;

&lt;p&gt;One way to integrate MCP servers into A2A agents is with Google’s &lt;strong&gt;Agent Development Kit (ADK)&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Install the ADK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;google-adk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Import the Required Modules
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ./adk_agent_samples/mcp_agent/agent.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents.llm_agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LlmAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.runners&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.sessions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InMemorySessionService&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.artifacts.in_memory_artifact_service&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InMemoryArtifactService&lt;/span&gt;  &lt;span class="c1"&gt;# Optional
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.tools.mcp_tool.mcp_toolset&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;MCPToolset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SseServerParams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load environment variables from a .env file in the parent directory
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.env&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configure the MCP Server and Fetch Tools
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# --- Step 1: import tools from an MCP server (HTTP SSE) ---
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_tools_async&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Gets tools from the Gmail MCP server.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Attempting to connect to MCP Filesystem server…&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exit_stack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;MCPToolset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;connection_params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SseServerParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://mcp.composio.dev/gmail/tinkling-faint-car-f6g1zk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MCP Toolset created successfully.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exit_stack&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; In the example above, we use the HTTP SSE endpoint for the Gmail server at &lt;a href="https://mcp.composio.dev" rel="noopener noreferrer"&gt;&lt;strong&gt;mcp.composio.dev&lt;/strong&gt;.&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  For a STDIO-based tool
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_tools_async&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Gets tools from a local MCP filesystem server (STDIO).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Attempting to connect to MCP Filesystem server…&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exit_stack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;MCPToolset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;connection_params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@modelcontextprotocol/server-filesystem&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/path/to/your/folder&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MCP Toolset created successfully.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exit_stack&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create the Agent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_agent_async&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Creates an ADK agent equipped with tools from the MCP server.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exit_stack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;get_tools_async&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fetched &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tools from MCP server.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;root_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# Adjust if needed
&lt;/span&gt;        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maps_assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Help the user with mapping and directions using the available tools.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exit_stack&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Define &lt;code&gt;main&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;async_main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;session_service&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InMemorySessionService&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;artifacts_service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InMemoryArtifactService&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Optional
&lt;/span&gt;
    &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp_maps_app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_maps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# TODO: Use specific addresses for reliable results with this server
&lt;/span&gt;    &lt;span class="n"&gt;query&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the route from 1600 Amphitheatre Pkwy to 1165 Borregas Ave&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User Query: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;

    &lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exit_stack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;get_agent_async&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp_maps_app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;artifact_service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;artifacts_service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Optional
&lt;/span&gt;        &lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Running agent…&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;events_async&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;new_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;events_async&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Event received: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Closing MCP server connection…&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;exit_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aclose&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cleanup complete.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;async_main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An error occurred: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Run the Application
&lt;/h3&gt;

&lt;p&gt;Execute the script to watch your A2A agent automatically call MCP-hosted tools in response to user queries.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;MCP makes it easier for agents to communicate with the tools that wrap external application services, and Agent2Agent makes it easier for multiple agents to communicate and collaborate. Both MCP and Agent2Agent are steps in the direction of standardising agent development. It would be interesting to see how they transform the agentic ecosystem.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Notes on Llama 4: The Hits, the Misses, and the Disasters</title>
      <dc:creator>Sunil Kumar Dash</dc:creator>
      <pubDate>Fri, 11 Apr 2025 12:59:16 +0000</pubDate>
      <link>https://forem.com/composiodev/notes-on-llama-4-the-hits-the-misses-and-the-disasters-18np</link>
      <guid>https://forem.com/composiodev/notes-on-llama-4-the-hits-the-misses-and-the-disasters-18np</guid>
      <description>&lt;p&gt;The Llama 4 is here, and this time, the Llama family has three different models: Llama 4 Scout, Maverick, and Behemoth. While the former two are available on multiple platforms, the Behemoth, as per Zuck and Meta, is still in training. And it is reportedly beating the current state-of-the-art.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5qib818bhk2chd8lakek.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5qib818bhk2chd8lakek.png" alt="image-1" width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is no improvement in the licensing part; the models come with the same Llama license that restricts any company with more than 700 million monthly active users from using the models without Meta’s permission and is also not available for Europeans.&lt;/p&gt;

&lt;p&gt;It’s crazy why they persist with this bogus license when &lt;a href="https://composio.dev/deepseek-v3-0324-the-sonnet-3-5-at-home/index.html" rel="noopener noreferrer"&gt;Deepseek v3 0324&lt;/a&gt; and r1 are readily available under MIT. Also, it is not available to Europeans. A Llama License in Big 2025 is criminal.&lt;/p&gt;

&lt;p&gt;But anyway, unlike the Llama 3, Meta has moved on from dense models to a Mixture of experts (MoEs). Both models come with a sparse mixture of experts: the Scout has 17B active and 109B total parameters with 16 experts, and the Maverick has 400B total and 17B active parameters with 128 experts.&lt;/p&gt;

&lt;p&gt;There are no local models this time; everyone expected Meta to bring smaller dense models (3b, 8b, 32b, 70b) like the last time. But hey, we still get two open-weight models except fellow Europeans.&lt;/p&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  The pitch
&lt;/li&gt;
&lt;li&gt;  The Hits

&lt;ul&gt;
&lt;li&gt;  10M in context length
&lt;/li&gt;
&lt;li&gt;  Natively multi-modal
&lt;/li&gt;
&lt;li&gt;  Teacher-student distillation
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;  The Misses

&lt;ul&gt;
&lt;li&gt;  Not enough
&lt;/li&gt;
&lt;li&gt;  Confused positioning
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;  The Disasters

&lt;ul&gt;
&lt;li&gt;  Not really 10M
&lt;/li&gt;
&lt;li&gt;  Benchmark blunders
&lt;/li&gt;
&lt;li&gt;  The tokenizer terror
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;  There’s still hope
&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Meta rushed the release of Llama 4 Maverick and Scout. (Could it be the tariffs?)&lt;/li&gt;
&lt;li&gt;  The Scout has a humongous 10M in context length, and the Maverick has 1M.&lt;/li&gt;
&lt;li&gt;  The models have been derived from the biggest Llama Behemoth, a 2T model.&lt;/li&gt;
&lt;li&gt;  The model is severely underwhelming on all fronts: code gen, writing, and everyday conversations.&lt;/li&gt;
&lt;li&gt;  The models tend to output verbose responses (yapping, they call it).&lt;/li&gt;
&lt;li&gt;  The models are so bad Meta had to fudge benchmarks.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Pitch
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://x.com/AIatMeta/status/1908598456144531660" rel="noopener noreferrer"&gt;Meta&lt;/a&gt;: Today is the start of a new era of natively multimodal AI innovation.&lt;/p&gt;

&lt;p&gt;Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality.&lt;/p&gt;

&lt;p&gt;Llama 4 Scout&lt;/p&gt;

&lt;p&gt;• 17B-active-parameter model with 16 experts.&lt;br&gt;
• Industry-leading context window of 10M tokens.&lt;br&gt;
• Outperforms Gemma 3, Gemini 2.0 Flash-Lite and Mistral 3.1 across a broad range of widely accepted benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Llama 4 Maverick&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;• 17B-active-parameter model with 128 experts.&lt;br&gt;
• Best-in-class image grounding with the ability to align user prompts with relevant visual concepts and anchor model responses to regions in the image.&lt;br&gt;
• Outperforms GPT-4o and Gemini 2.0 Flash across a broad range of widely accepted benchmarks.&lt;br&gt;
• Achieves comparable results to DeepSeek v3 on reasoning and coding — at half the active parameters.&lt;br&gt;
• Unparalleled performance-to-cost ratio with a chat version scoring ELO of 1417 on LMArena.&lt;/p&gt;

&lt;p&gt;These models are our best yet thanks to distillation from Llama 4 Behemoth, our most powerful model yet. Llama 4 Behemoth is still in training and is currently seeing results that outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks. We’re excited to share more details about it even while it’s still in flight.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It sure got the big model vibes, though no local models will sting a lot of Llama enthusiasts.&lt;/p&gt;

&lt;p&gt;The Llama 4 Pre-training precap by &lt;a href="https://x.com/eliebakouch/status/1908608627029455098" rel="noopener noreferrer"&gt;Elle&lt;/a&gt; from Huggingface.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;MetaP: MuP inspired method to set per layers hyperparameters that transfer across batch size, width, depth and training token (huge)&lt;/p&gt;

&lt;p&gt;MoE with 16E and 128E&lt;/p&gt;

&lt;p&gt;QK Norm with no learnable parameter (and the 128E have no QK Norm it seems)&lt;/p&gt;

&lt;p&gt;FP8 Training&lt;/p&gt;

&lt;p&gt;No rope on interleaved attention layers, but i don’t see any sliding window attention? (one of the key receipe for the 10M context they said)&lt;/p&gt;

&lt;p&gt;temperature tuning on the no rope layers&lt;/p&gt;

&lt;p&gt;Native multimodal training&lt;/p&gt;

&lt;p&gt;Mixture with 30T token (text, images and video), training budget of 40T for 128E and 22T for the 16E&lt;/p&gt;

&lt;p&gt;No details on optimizer…&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Hits
&lt;/h2&gt;

&lt;h3&gt;
  
  
  10M in context length
&lt;/h3&gt;

&lt;p&gt;The model has its own highs. The most prominent one is the 10 million context length in the Scout model. This is the first model including both open-source and proprietary LLMs. The Maverick comes with a one million context length.&lt;/p&gt;

&lt;p&gt;Ideally, you can put your entire code base in context and get the LLM to work on it. However, I don’t think it will be the same for Behemoth, which would have been much better at coding than Llama 4 Scout.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmnz3sjtjaldmfnhew7l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmnz3sjtjaldmfnhew7l.png" alt="Screenshot-2025-04-08-at-6.20.00-PM" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is huge. Until now, only Google has been able to crack the long context window, and now Meta. The needle in the haystack results are also promising for Llama 4 models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9c310bx2p9u6oetr3gi7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9c310bx2p9u6oetr3gi7.png" alt="Screenshot-2025-04-08-at-7.09.25-PM" width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Natively Multi-modal
&lt;/h3&gt;

&lt;p&gt;The other pros of the model are that it is natively multi-modal and understands texts, images, audio, and videos, though the output modality is limited to text only.&lt;/p&gt;

&lt;h3&gt;
  
  
  Teacher-Student Distillation
&lt;/h3&gt;

&lt;p&gt;However, the more interesting part is the teacher-student distillation from Llama 4 Behemoth. This is the first from Meta.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Meta: We codistilled the Llama 4 Maverick model from Llama 4 Behemoth as a teacher model, resulting in substantial quality improvements across end task evaluation metrics. We developed a novel distillation loss function that dynamically weights the soft and hard targets through training.  Codistillation from Llama 4 Behemoth during pre-training amortizes the computational cost of resource-intensive forward passes needed to compute the targets for distillation for the majority of the training data used in student training. For additional new data incorporated in student training, we ran forward passes on the Behemoth model to create distillation targets.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Some notes from &lt;a href="https://x.com/_xjdr/status/1909278813852508486" rel="noopener noreferrer"&gt;xjdr&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;  – Scout is best at summarization and function calling. exactly what you want from a cheap long ctx model. this is going to be a workhorse in coding flows and RAG applications. the single shot ICL recall is very very good.&lt;/li&gt;
&lt;li&gt;  – Maverick was built for replacing developers and doing agenic / tool calling work. it is very consistent in instruction following, very long context ICL and parallel multi tool calls. this is EXACTLY the model and capabilities i want in my coder style flows. it is not creative, i have V3 and R1 for that tho. multimodal is very good at OCR and charts and graphs outperforming both 4o and qwen 2.5 VL 72 in my typical tests. the only thing i haven’t tested is computer use but i doubt it will beat sonnet or qwen at that as both models were explicitly trained for it. The output is kind of bland (hence the constant 4o comparisons) with little personality, which is totally fine. this is a professional tool built for professional work (testing it on RP or the like will lead to terrible results). Im not sure what more you could ask for in a agent focused model.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Misses
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Not enough
&lt;/h3&gt;

&lt;p&gt;Much of the initial excitement faded as users began experiencing it firsthand. Expectations were sky-high from Llama 4, but recent releases have genuinely raised the bar for what’s possible, leaving people disappointed. Both model have grossly underperformed their peers.&lt;/p&gt;

&lt;p&gt;The initial reaction from &lt;a href="https://x.com/teortaxesTex/status/1908602241046528218" rel="noopener noreferrer"&gt;Teortaxes&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwovz4una6x4n1j9dpkx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwovz4una6x4n1j9dpkx.png" alt="Screenshot-2025-04-08-at-7.46.01-PM" width="800" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And also&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsy74d8j0rifne1yhbeqy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsy74d8j0rifne1yhbeqy.png" alt="Screenshot-2025-04-08-at-7.51.48-PM" width="800" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model struggled to score 16% on the &lt;a href="https://composio.dev/aider.chat/docs/leaderboards/" rel="noopener noreferrer"&gt;Aider Polyglot benchmark&lt;/a&gt;, which is fairly respected and consists of coding problems from multiple languages on different tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5za62qdqf27kh91qxaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5za62qdqf27kh91qxaf.png" alt="image-2" width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s scoring around Qwen 2.5 coder, even being 10 times the model, doesn’t instil confidence. Coding is definitely not its strongest suit.&lt;/p&gt;

&lt;p&gt;The models are also grossly underperforming on long-form writing benches.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://x.com/sam_paech/status/1908694877120192759" rel="noopener noreferrer"&gt;Sam Peach&lt;/a&gt;: I made a new longform writing benchmark. It involves planning out &amp;amp; writing a novella (8x 1000 word chapters) from a minimal prompt. Outputs are scored by sonnet-3.7.&lt;br&gt;
Llama-4 performing not so well. :~(&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fialssh3vpdvzwtaw9a12.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fialssh3vpdvzwtaw9a12.png" alt="image-4" width="800" height="682"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Llama 4 models are even underperforming QwQ-32b and Reka Flash 3. It seems it’s not good at creative writing, either.&lt;/p&gt;

&lt;h3&gt;
  
  
  Confused Positioning
&lt;/h3&gt;

&lt;p&gt;Confused positioning: it’s neither very cheap nor brilliant compared to peers.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://x.com/theo/status/1909001417014284553" rel="noopener noreferrer"&gt;Theo&lt;/a&gt;: Increasingly confused about where Llama 4 fits in the market&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzwe88as53cgawx5prjvb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzwe88as53cgawx5prjvb.png" alt="image-5" width="800" height="513"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not doing well on Arc AGI semi-private evals.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://dev.toARC-AGI"&gt;ARC-AGI&lt;/a&gt;: Llama 4 Maverick and Scout on ARC-AGI’s Semi Private Evaluation&lt;/p&gt;

&lt;p&gt;Maverick:&lt;br&gt;
* ARC-AGI-1: 4.38% ($0.0078/task)&lt;br&gt;
* ARC-AGI-2: 0.00% ($0.0121/task)&lt;/p&gt;

&lt;p&gt;Scout:&lt;br&gt;
* ARC-AGI-1: 0.50% ($0.0041/task)&lt;br&gt;
* ARC-AGI-2: 0.00% ($0.0062/task)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3qhkkw9zzwwtzff1q9go.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3qhkkw9zzwwtzff1q9go.png" alt="image-6" width="800" height="522"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The scores sure are not very encouraging.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Disasters
&lt;/h2&gt;

&lt;p&gt;There have been some serious issues with the Llama 4 launch, and the situation is terrible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Not really 10M
&lt;/h3&gt;

&lt;p&gt;The least serious issue is that the claim of 10m context window is not really true. The model performance tends to get worse as the context length increases.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://x.com/ivanfioravanti/status/1909337288816861271" rel="noopener noreferrer"&gt;Ivan&lt;/a&gt;: It’s impossible for Llama-4 to degrade so much at 120k context. How can a big AI lab like Meta push a 10M limit in their announcement and have such poor real-life results? I hope there are bugs somewhere causing this.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Llama 4 models on LiveCodebench&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkksrlrz0ql5403dc9f3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkksrlrz0ql5403dc9f3.png" alt="image-7" width="800" height="811"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmark Blunders
&lt;/h3&gt;

&lt;p&gt;Surprisingly, the biggest highlight of this launch wasn’t model performance but the Lmsys benchmark blunder. Many Llama and open-source enthusiasts pointed out the mismatch between the Lmsys ELO rating and model performance.&lt;/p&gt;

&lt;p&gt;Did Meta game the benchmark? No, not really.&lt;/p&gt;

&lt;p&gt;They have explicitly mentioned in their release blog they have released an experimental version of Llama 4 Maverick optimised for human conversations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfeipaj1f1xcsl806tg8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfeipaj1f1xcsl806tg8.png" alt="image-8" width="800" height="537"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="%CC%A7https://x.com/Ahmad_Al_Dahle/status/1909302532306092107"&gt;Ahmad Al Dahl&lt;/a&gt;, the lead at GenAI Meta, clarified the model.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We’re glad to start getting Llama 4 in all your hands. We’re already hearing lots of great results people are getting with these models. That said, we’re also hearing some reports of mixed quality across different services. Since we dropped the models as soon as they were ready, we expect it’ll take several days for all the public implementations to get dialed in. We’ll keep working through our bug fixes and onboarding partners. We’ve also heard claims that we trained on test sets — that’s simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilize implementations. We believe the Llama 4 models are a significant advancement and we’re looking forward to working with the community to unlock their value&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Turns out the model release was rushed, and there are many rough edges Meta didn’t bother fixing before the launch.&lt;/p&gt;

&lt;p&gt;But this also exposes how bad the LMSYS arena is for LLM evaluations. In response to community questions, they released a statement.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We’ve seen questions from the community about the latest release of Llama-4 on Arena. To ensure full transparency, we’re releasing 2,000+ head-to-head battle results for public review. This includes user prompts, model responses, and user preferences. (link in next tweet)&lt;br&gt;
Early analysis shows style and model response tone was an important factor (demonstrated in style control ranking), and we are conducting a deeper analysis to understand more! (Emoji control?)&lt;/p&gt;

&lt;p&gt;In addition, we’re also adding the HF version of Llama-4-Maverick to Arena, with leaderboard results published shortly. Meta’s interpretation of our policy did not match what we expect from model providers. Meta should have made it clearer that “Llama-4-Maverick-03-26-Experimental” was a customized model to optimize for human preference. As a result of that we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Interestingly, evaluators prefer lengthy and modified responses from the Llama 4 maverick in all the head-to-head examples. This is one such example where Llama 4 was the winner.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkcmwotnw69d6vmdjti9d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkcmwotnw69d6vmdjti9d.png" alt="Screenshot-2025-04-09-at-5.47.36-PM" width="800" height="306"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model output is more tuned to please human readers, and Lmsys tends to like it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://x.com/emollick/status/1909414182962790467" rel="noopener noreferrer"&gt;Ethan Mollick&lt;/a&gt;: The Llama 4 model that won in LM Arena is different than the released version. I have been comparing the answers from Arena to the released model. They aren’t close. The data is worth a look also as it shows how LM Arena results can be manipulated to be more pleasing to humans.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here’s what Susan Zhang has to say on the same thing&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://x.com/suchenzang/status/1908795054011146308" rel="noopener noreferrer"&gt;Susan&lt;/a&gt;: how did this llama4 score so high on lmsys?? i’m still buckling up to understand qkv through family reunions and weighted values for loving cats…&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2slkhtqp41pbwkb6jty9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2slkhtqp41pbwkb6jty9.png" alt="image-9" width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, I don’t know if it is an LLM problem or a benchmark problem.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://x.com/vikhyatk/status/1909403603409969533" rel="noopener noreferrer"&gt;Vik&lt;/a&gt;: This is the clearest evidence that no one should take these rankings seriously. In this example it’s super yappy and factually inaccurate, and yet the user voted for Llama 4. The rest aren’t any better.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This gets even worse: a former Meta employee took it to Reddit and posted how Meta has manipulated the Lmarena benchmark.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flnn9w71clrjz7quztisu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flnn9w71clrjz7quztisu.png" alt="image-10" width="680" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As a Llama enthusiast, this was my 9/11. It’s OK if the model underperforms, but not being honest is a crime before the gods.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw5vgghzpi2uuvysa6ypd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw5vgghzpi2uuvysa6ypd.png" alt="image-12" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The tokenizer terror
&lt;/h3&gt;

&lt;p&gt;The woes don’t end here. The tokenizer scene is even more grim.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://x.com/menhguin/status/1908614782930014360" rel="noopener noreferrer"&gt;Kalomaze&lt;/a&gt;: if at any point someone on your team says&lt;/p&gt;

&lt;p&gt;“yeah we need 10 special tokens for reasoning and 10 for vision and another 10 for image generation and 10 agent tokens and 10 post tr-”&lt;/p&gt;

&lt;p&gt;you should have slapped them this is what happens when that doesn’t happen&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/menhguin/status/1908614782930014360" rel="noopener noreferrer"&gt;Minh Nhat Nguyen&lt;/a&gt;: do not go into the llama tokenizer dot json. worst mistake of my life&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbo5ot5gywpgyt8qanawd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbo5ot5gywpgyt8qanawd.png" alt="image-11" width="449" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  There still hope
&lt;/h2&gt;

&lt;p&gt;It’s not all gone. There is still hope for redemption. The Behemoth 2T model can partially redeem their lost reputation. But for a 2T model, it has to be as good as the Grok 3 at the very least, or else it’d be over for Meta and Llama.&lt;/p&gt;

&lt;p&gt;But there is hope,&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foljjhlh9ypmo8l76w7u5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foljjhlh9ypmo8l76w7u5.png" alt="Screenshot-2025-04-09-at-7.33.22-PM" width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
