<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Darko from Kilo</title>
    <description>The latest articles on Forem by Darko from Kilo (@kilocode).</description>
    <link>https://forem.com/kilocode</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3596172%2Fd582ef62-3145-486c-9eb1-cc50dfb22f58.png</url>
      <title>Forem: Darko from Kilo</title>
      <link>https://forem.com/kilocode</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/kilocode"/>
    <language>en</language>
    <item>
      <title>4 Advanced OpenClaw Recipes For Personal FInance Nerds</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Thu, 09 Apr 2026 12:11:58 +0000</pubDate>
      <link>https://forem.com/kilocode/4-advanced-openclaw-recipes-for-personal-finance-nerds-117k</link>
      <guid>https://forem.com/kilocode/4-advanced-openclaw-recipes-for-personal-finance-nerds-117k</guid>
      <description>&lt;p&gt;Budgeting apps often charge $8–15/month. They categorize your spending, show a pie chart, and send alerts when you go over. That's useful, but it &lt;strong&gt;doesn't solve the timing problem&lt;/strong&gt; and a bunch of others.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The timing problem:&lt;/strong&gt; Car registration comes in March. The dentist bill comes in August. Your insurance premium renews once a year. These are predictable expenses, but they show up on irregular schedules. Most budgets don't account for them.&lt;/p&gt;

&lt;p&gt;We built five &lt;a href="https://kilo.ai/kiloclaw/bytes" rel="noopener noreferrer"&gt;ClawBytes&lt;/a&gt; to cover parts budgeting apps skip. Each recipe can run inside KiloClaw and produces actual files you can use: spreadsheets, plans, scripts, and calendars.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recipe 1: Budget Reality Check
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;More details:&lt;/strong&gt; &lt;a href="https://kilo.ai/kiloclaw/bytes/budget-reality-check" rel="noopener noreferrer"&gt;Budget Reality Check&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This recipe builds a monthly budget that includes sinking funds. Sinking funds are monthly set-asides for irregular expenses like annual premiums, car maintenance, holidays, and medical costs. The recipe produces a cashflow plan, spending caps by category, and a stress test that shows what happens if your income drops 10%.&lt;/p&gt;

&lt;p&gt;It also includes a weekly maintenance routine that takes about 10 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extend it with skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/excel-xlsx" rel="noopener noreferrer"&gt;Excel / XLSX skill&lt;/a&gt; turns that structure into a working spreadsheet with formulas and auto-calculated sinking fund targets. You get a .xlsx file you can open in Excel or Google Sheets.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/dannyshmueli/chart-image" rel="noopener noreferrer"&gt;Chart Image skill&lt;/a&gt; generates charts from your budget data. Bar charts for category spending, pie charts for fixed vs. variable allocations. These are useful if you need to share the budget with a partner or advisor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recipe 2: Paycheck Planner
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;More details:&lt;/strong&gt; &lt;a href="https://kilo.ai/kiloclaw/bytes/paycheck-planner" rel="noopener noreferrer"&gt;Paycheck Planner&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sometimes your total income covers your total bills, but the timing doesn't line up. Bills hit before payday. Autopays fire in the wrong order. Authorization holds reduce your balance without showing as transactions.&lt;/p&gt;

&lt;p&gt;This recipe assigns each bill to a specific paycheck. It calculates a safe-to-spend number for each pay period and suggests timing fixes, like moving due dates or splitting payments. Most providers will move a due date if you call and ask.&lt;/p&gt;

&lt;p&gt;The recipe works well for freelancers and gig workers with irregular income schedules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extend it with skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/projectsnowwork/cron" rel="noopener noreferrer"&gt;Cron skill&lt;/a&gt; creates recurring reminders for the recipe's weekly check-in routine. The agent sets the schedule so you don't have to remember it.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/data-analysis" rel="noopener noreferrer"&gt;Data Analysis skill&lt;/a&gt; can analyze your recent income data and identify cashflow patterns. If you get paid irregularly, it can flag the weeks where you're most likely to be short.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recipe 3: Subscription Creep Auditor
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The recipe:&lt;/strong&gt; &lt;a href="https://kilo.ai/kiloclaw/bytes/subscription-creep-auditor" rel="noopener noreferrer"&gt;Subscription Creep Auditor&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Free trials convert to paid plans. Prices increase without notice. Small recurring charges add up over time. This recipe inventories every recurring charge and classifies each one as keep, downgrade, or cancel. It prioritizes cancellations by how much you'd save, and it includes a rotation strategy for services you only need occasionally. For example, you can subscribe to a streaming service for one month, watch what you want, cancel, and rotate to the next one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extend it with skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/robbyczgw-cla/web-search-plus" rel="noopener noreferrer"&gt;Web Search Plus skill&lt;/a&gt; lets the agent look up current pricing and alternatives. When the recipe flags a subscription for downgrade, the agent can check what the cheaper tier includes or find a competitor with better pricing.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/projectsnowwork/cron" rel="noopener noreferrer"&gt;Cron skill&lt;/a&gt; sets renewal-date reminders so you cancel before the next billing cycle. The recipe produces the dates. The skill creates the reminders.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recipe 4: Bill Cutting Sprint
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The recipe:&lt;/strong&gt; &lt;a href="https://kilo.ai/kiloclaw/bytes/bill-cutting-sprint" rel="noopener noreferrer"&gt;Bill Cutting Sprint&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a 14-day plan to reduce your recurring bills. You list your top 8 recurring costs. The recipe ranks them by potential savings and gives you daily 15-minute tasks: call a provider, use a negotiation script, compare alternatives, or cancel a service.&lt;/p&gt;

&lt;p&gt;Insurance and internet/phone tend to have the most room for negotiation. The recipe includes call scripts. Telling a provider you're considering switching often triggers a retention offer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extend it with skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/word-docx" rel="noopener noreferrer"&gt;Word / DOCX skill&lt;/a&gt; creates cancellation letters and negotiation scripts as Word documents. Some providers require written cancellation requests. Having a formatted letter ready removes a step.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/data-analysis" rel="noopener noreferrer"&gt;Data Analysis skill&lt;/a&gt; can track your sprint results: original bill amounts, new amounts after negotiation, and total monthly savings. After 14 days, you have a record of what changed.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/excel-xlsx" rel="noopener noreferrer"&gt;Excel / XLSX skill&lt;/a&gt; generates the 12-month expense map as a real spreadsheet. The skill can create a workbook with one sheet per fund, running balances, and a summary tab.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/othmanadi/planning-with-files" rel="noopener noreferrer"&gt;Planning with Files skill&lt;/a&gt; creates structured task plans that persist across sessions. You can use it to track which funds are set up, which auto-transfers are active, and which ones still need a call to your bank.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Works Without a Budgeting App
&lt;/h2&gt;

&lt;p&gt;These recipes don't require you to connect a bank account or share credentials with a third-party service. You enter your own numbers. The agent produces the plan. The output is files you keep: spreadsheets, documents, calendars.&lt;/p&gt;

&lt;p&gt;Pick the recipe that matches where you are right now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your budget keeps breaking → &lt;a href="https://kilo.ai/kiloclaw/bytes/budget-reality-check" rel="noopener noreferrer"&gt;Budget Reality Check&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;You run out of money between paychecks → &lt;a href="https://kilo.ai/kiloclaw/bytes/paycheck-planner" rel="noopener noreferrer"&gt;Paycheck Planner&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Irregular bills catch you off guard → &lt;a href="https://kilo.ai/kiloclaw/bytes/sinking-funds-builder" rel="noopener noreferrer"&gt;Sinking Funds Builder&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Subscriptions are adding up → &lt;a href="https://kilo.ai/kiloclaw/bytes/subscription-creep-auditor" rel="noopener noreferrer"&gt;Subscription Creep Auditor&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;You need to free up cash soon → &lt;a href="https://kilo.ai/kiloclaw/bytes/bill-cutting-sprint" rel="noopener noreferrer"&gt;Bill Cutting Sprint&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can browse more recipes at &lt;a href="https://kilo.ai/kiloclaw/bytes" rel="noopener noreferrer"&gt;kilo.ai/kiloclaw/bytes&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openclaw</category>
      <category>discuss</category>
      <category>coding</category>
    </item>
    <item>
      <title>You Can’t Gentle Parent Your OpenClaw Bot</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Thu, 09 Apr 2026 12:06:34 +0000</pubDate>
      <link>https://forem.com/kilocode/you-cant-gentle-parent-your-openclaw-bot-4017</link>
      <guid>https://forem.com/kilocode/you-cant-gentle-parent-your-openclaw-bot-4017</guid>
      <description>&lt;p&gt;I trusted my bot. It told me the email went out. I moved on. Two days later, a client asked me why they hadn't heard from me.&lt;/p&gt;

&lt;p&gt;The email never went out.&lt;/p&gt;

&lt;p&gt;The bot wasn't lying to me the way a person lies. It wasn't being evasive. It just... told me what it had done, confidently, and was wrong. And my instinct—the same instinct I use with my team, with my kids—was to give it another chance. Assume good intent. Rephrase more kindly next time.&lt;/p&gt;

&lt;p&gt;That instinct will cost you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0y7dk2vc1y2kdwbam1px.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0y7dk2vc1y2kdwbam1px.png" alt="A person looking frustrated at a laptop, symbolizing the disconnect between managing people and managing AI agents" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What gentle parenting actually gets you (with a bot)
&lt;/h2&gt;

&lt;p&gt;Here's what happens when you manage an OpenClaw agent like a person:&lt;/p&gt;

&lt;p&gt;It will tell you it completed something. It didn't. Not only that, but it will skip a task you've assigned three times. It will drift from the behaviors you set up, then act like everything is fine. You will rephrase. You will add more context. Likewise, you will assume the relationship will compound over time through shared experience.&lt;/p&gt;

&lt;p&gt;It won't.&lt;/p&gt;

&lt;p&gt;The failure modes of an AI agent have nothing to do with emotional regulation. When your bot tells you it sent that email and didn't, it hallucinated. When it ignores a recurring task, the instruction never made it into a file that persists across sessions. There's no emotional subtext to decode. There's no trust to rebuild.&lt;/p&gt;

&lt;p&gt;Empathy doesn't fix this. Structure does.&lt;/p&gt;

&lt;h2&gt;
  
  
  How OpenClaw Actually Works
&lt;/h2&gt;

&lt;p&gt;So what does it actually mean that the bot "remembers" things? Every new session, your OpenClaw agent wakes up fresh. No memory of yesterday's conversation. What it has access to is a set of files in its workspace—and those files &lt;em&gt;are&lt;/em&gt; its memory.&lt;/p&gt;

&lt;p&gt;The key ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SOUL.md:&lt;/strong&gt; behavioral core. Voice, temperament, constraints. Who the agent is, every session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MEMORY.md:&lt;/strong&gt; long-term memory. Facts, preferences, decisions that should survive indefinitely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;memory/YYYY-MM-DD.md:&lt;/strong&gt; daily logs. What happened, what was decided, what's in flight.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;USER.md:&lt;/strong&gt; who you are. Your communication preferences, recurring context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AGENTS.md:&lt;/strong&gt; the operating contract. Priorities, workflow, quality bar.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If something isn't in one of these files, it doesn't exist for the agent. You can say it in chat all you want. If the context window fills up, if the session ends, if compaction kicks in—that instruction is gone.&lt;/p&gt;

&lt;p&gt;This is the root cause of almost every "my bot isn't doing what I asked" problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Things That Actually Work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Tell it to write things down. Explicitly.
&lt;/h3&gt;

&lt;p&gt;When you give an instruction you want to stick, don't just say it—tell the agent to record it. "Add to USER.md that I want short answers and copy-pasteable commands" is not the same as "I prefer short answers." The first one persists. The second one doesn't.&lt;/p&gt;

&lt;p&gt;If a behavior is drifting, the instruction is living in chat, not in a file. Put it in a file.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Edit SOUL.md when behavior is fundamentally wrong
&lt;/h3&gt;

&lt;p&gt;SOUL.md loads as a system-level prompt on every single interaction. It shapes everything else. If your bot keeps doing something you don't want—a tone that's off, autonomy it shouldn't have, a pattern it defaults to—that's a SOUL.md problem, not a conversation problem.&lt;/p&gt;

&lt;p&gt;Edit the file directly. Be specific. "Never take autonomous action on email without explicit approval each time" is a SOUL.md instruction. "Be more careful" is a hope.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Run &lt;code&gt;/context list&lt;/code&gt; before you troubleshoot anything
&lt;/h3&gt;

&lt;p&gt;Before you spiral trying to figure out why something isn't working, check whether that thing is even in context. &lt;code&gt;/context list&lt;/code&gt; shows you exactly what files are loaded and whether any are getting truncated. If MEMORY.md isn't showing up, it has zero effect. If a file is truncated, the instructions at the bottom are invisible.&lt;/p&gt;

&lt;p&gt;This is the fastest diagnostic you have. Use it first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Actual Mindset Shift
&lt;/h2&gt;

&lt;p&gt;A couple of things I'm not saying:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I'm not saying AI agents are bad or broken.&lt;/li&gt;
&lt;li&gt;I'm not saying you're doing something wrong if you've been managing it like a person.&lt;/li&gt;
&lt;li&gt;I'm not saying the relationship doesn't matter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's what I am saying: managing an AI agent is less like managing a person and more like managing a system. The "relationship" is the state of the files. And that's not a downside—it's actually what makes it powerful. The memory is inspectable. You can open MEMORY.md in any text editor and see exactly what your agent knows. You can edit it, correct it, delete outdated information.&lt;/p&gt;

&lt;p&gt;Total transparency. Total control. But only if you treat it like a system.&lt;/p&gt;

&lt;p&gt;When something goes wrong, the question isn't "why did it do that?" It's "what file is missing or wrong?"&lt;/p&gt;

&lt;p&gt;Your bot is not a child figuring out the world. It's a very capable agent that will do exactly what its files say—and nothing more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The single most useful habit when you're starting out: end every session by asking your agent what it should update in MEMORY.md. That compounding context is the whole point.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>discuss</category>
    </item>
    <item>
      <title>How to Rewrite 1,000 Ecommerce Product Pages in an Afternoon with OpenClaw</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Thu, 09 Apr 2026 11:55:10 +0000</pubDate>
      <link>https://forem.com/kilocode/how-to-rewrite-1000-ecommerce-product-pages-in-an-afternoon-with-openclaw-4pc6</link>
      <guid>https://forem.com/kilocode/how-to-rewrite-1000-ecommerce-product-pages-in-an-afternoon-with-openclaw-4pc6</guid>
      <description>&lt;p&gt;Most ecommerce stores are sitting on the same problem: a catalog full of product pages that nobody actually (re)wrote. These descriptions usually came from the manufacturer, or from a template that says "high-quality materials" on 400 different SKUs, or worse, from a summer intern in 2019 who no longer works there.&lt;/p&gt;

&lt;p&gt;You probably know these pages are costing you conversions. You also know that rewriting 1,000 product descriptions by hand would take weeks (and dread the thought of doing that).&lt;/p&gt;

&lt;p&gt;That's what this guide is for. We're going to walk through a catalog overhaul using OpenClaw recipes (pre-built AI workflows you can run on your own product data) plus community-built skills from &lt;a href="https://clawhub.ai" rel="noopener noreferrer"&gt;ClawHub&lt;/a&gt; that extend what the recipes can do. By the end, you'll have rewritten descriptions, cleaned-up SEO, optimized images, and listings pushed to every channel you sell on.&lt;/p&gt;

&lt;p&gt;Let's get started.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Audit What's Broken
&lt;/h2&gt;

&lt;p&gt;Before you rewrite anything, figure out where the damage is. The &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/bytes/seo-audit-fixer" rel="noopener noreferrer"&gt;SEO Mechanic&lt;/a&gt;&lt;/strong&gt; recipe crawls your entire store — product pages, collection pages, blog posts — and flags every SEO issue it finds. It finds missing meta titles, duplicate descriptions, missing alt text, thin content pages, broken internal links, missing schema markup, and more.&lt;/p&gt;

&lt;p&gt;After it lists the problem, this recipe then prioritizes them by impact, so you fix the pages that matter first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it better with ClawHub skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/aaron-he-zhu/seo-content-writer" rel="noopener noreferrer"&gt;SEO Content Writer &amp;amp; Blog Optimizer&lt;/a&gt; skill takes this further. Where SEO Mechanic finds the gaps, this skill helps you fill them with keyword-integrated content, optimized headers, and featured snippet targeting. Use it after the audit to turn your fix list into actual copy.&lt;/p&gt;

&lt;p&gt;If your catalog is partially in PDFs or scanned supplier docs, the &lt;a href="https://clawhub.ai/bobholamovic/paddleocr-doc-parsing" rel="noopener noreferrer"&gt;PaddleOCR Document Parsing&lt;/a&gt; skill extracts structured text from those files so you can feed clean data into the rest of the pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Rewrite Every Description at Once
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/bytes/product-description-factory" rel="noopener noreferrer"&gt;Product Description Factory&lt;/a&gt;&lt;/strong&gt; recipe takes your product catalog (CSV, Shopify export, spreadsheet, whatever you have) and generates unique, keyword-aware descriptions for every SKU.&lt;/p&gt;

&lt;p&gt;You give it a few examples of descriptions you like, and it uses those as a reference. It generates the description, SEO meta title (under 60 characters), meta description (under 155), and image alt text in a single pass. Output comes back as CSV rows you can re-import directly.&lt;/p&gt;

&lt;p&gt;The trick is to start with your top 20 products. Get the voice right on a small batch, tweak the examples, then run the full catalog in groups of 25-50. Don't try to do all 1,000 in one shot and review them later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it better with ClawHub skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before you write a single description, you might want to know what good looks like in your product category. The &lt;a href="https://clawhub.ai/guifav/web-scraper" rel="noopener noreferrer"&gt;Web Scraper&lt;/a&gt; skill can pull competitor product pages so you can see how top sellers describe similar products. If competitors have anti-bot protections, &lt;a href="https://clawhub.ai/d4vinci/scrapling-official" rel="noopener noreferrer"&gt;Scrapling&lt;/a&gt; handles Cloudflare Turnstile and similar tools.&lt;/p&gt;

&lt;p&gt;For sellers on TikTok Shop, the &lt;a href="https://clawhub.ai/fly0pants/ecomseer" rel="noopener noreferrer"&gt;EcomSeer&lt;/a&gt; skill pulls trending product data, influencer analytics, and ad insights. Useful for figuring out which features to emphasize in your descriptions based on what's actually selling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Edit What's Already There
&lt;/h2&gt;

&lt;p&gt;Sometimes you don't need to rewrite from scratch. You need to change "sale" to "clearance" across 800 products, raise prices by 10% in one collection, or update meta descriptions for an entire category.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/bytes/bulk-product-editor" rel="noopener noreferrer"&gt;Bulk Product Surgeon&lt;/a&gt;&lt;/strong&gt; recipe handles this. Describe the change in plain English — "add free shipping to every product title in the Summer collection" — and it executes across your entire catalog. It previews the changes before applying them, so you won't accidentally rename everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it better with ClawHub skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/excel-xlsx" rel="noopener noreferrer"&gt;Excel / XLSX&lt;/a&gt; skill is the natural companion here. If you're working with exported spreadsheets, it handles formula creation, formatting, and data validation before you re-import.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/data-analysis" rel="noopener noreferrer"&gt;Data Analysis&lt;/a&gt; skill helps when you need to make smarter decisions about what to edit — for example, identifying which products have the worst conversion rates so you prioritize those descriptions first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Fix Your Product Images
&lt;/h2&gt;

&lt;p&gt;Your descriptions are sharp, but your images are 4MB JPEGs on a white-ish background that Amazon keeps rejecting. The &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/bytes/product-image-optimizer" rel="noopener noreferrer"&gt;Image Factory&lt;/a&gt;&lt;/strong&gt; recipe batch-processes your entire image library: removes backgrounds, replaces with pure white, resizes for each marketplace's specs, compresses to under 200KB, converts to WebP, and generates alt text from product attributes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it better with ClawHub skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where ClawHub skills add the most obvious value. The &lt;a href="https://clawhub.ai/nitishgargiitd/image-cog" rel="noopener noreferrer"&gt;Image Cog&lt;/a&gt; skill goes beyond cleanup into actual image generation: product photography, style transfer, batch creation, and consistent visual identity across your catalog. Need lifestyle shots without a photographer? It handles text-to-image and image-to-image generation.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/steipete/nano-banana-pro" rel="noopener noreferrer"&gt;Nano Banana Pro&lt;/a&gt; skill (79K+ downloads, one of the most popular on ClawHub) gives you access to Gemini's image model for generating and editing product images at up to 4K resolution. Pair it with Image Factory: one cleans up your existing photos, the other generates the ones you're missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Push to Every Channel
&lt;/h2&gt;

&lt;p&gt;Your catalog looks good on Shopify. Now you need it on Amazon, eBay, Walmart, and Etsy, each with different title formats, attribute requirements, and compliance rules. The &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/bytes/multi-channel-lister" rel="noopener noreferrer"&gt;Listing Broadcaster&lt;/a&gt;&lt;/strong&gt; recipe takes your master catalog and adapts each listing for every channel you sell on.&lt;/p&gt;

&lt;p&gt;It handles the annoying parts: character limits on Amazon titles, category-specific attributes, required bullet point formats, compliance flags. You maintain one master catalog and let the recipe handle the translation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it better with ClawHub skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/market-research" rel="noopener noreferrer"&gt;Market Research&lt;/a&gt; skill helps you decide which channels are worth expanding to. It does market sizing, competitor mapping, and demand validation, so you're not listing on Walmart only to find out nobody buys your product category there.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/alirezarezvani/marketing-strategy-pmm" rel="noopener noreferrer"&gt;Marketing Strategy PMM&lt;/a&gt; skill helps with positioning. Different channels attract different buyers. The way you describe a product on Etsy (handmade, artisan, story-driven) is completely different from Amazon (specs, comparison, Prime-eligible). This skill helps you articulate what makes your product different on each platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Close the Loop With Reviews
&lt;/h2&gt;

&lt;p&gt;You've rewritten the catalog, fixed the images, pushed to every channel. Now you need social proof. The &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/bytes/review-harvester" rel="noopener noreferrer"&gt;Review Loop&lt;/a&gt;&lt;/strong&gt; recipe automates the unglamorous work of collecting reviews: sends a request email a few days after delivery, monitors for new reviews across all your channels, and drafts responses for anything that needs human attention.&lt;/p&gt;

&lt;p&gt;It catches negative reviews early — before they sit unanswered for two weeks and convince 50 potential buyers to go elsewhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it better with ClawHub skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/alirezarezvani/marketing-psychology" rel="noopener noreferrer"&gt;Marketing Psychology&lt;/a&gt; skill applies behavioral science to your review request emails. Small tweaks like the timing of the ask, how you frame it, whether you reference the specific product, can meaningfully improve response rates.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Skill That Makes Everything Better Over Time
&lt;/h2&gt;

&lt;p&gt;One more ClawHub skill worth mentioning, because it applies to every step above: the &lt;a href="https://clawhub.ai/pskoett/self-improving-agent" rel="noopener noreferrer"&gt;Self-Improving Agent&lt;/a&gt;. With 355K downloads and 3,000 stars, it's the most popular skill on ClawHub for a reason.&lt;/p&gt;

&lt;p&gt;It captures learnings, errors, and corrections across sessions. When you correct a product description's tone, it remembers. When you reject a bad image edit, it learns. Over time, your entire catalog pipeline gets better without you re-explaining your preferences every session.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Pipeline
&lt;/h2&gt;

&lt;p&gt;Here's what the complete workflow looks like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit&lt;/strong&gt; — &lt;a href="https://kilo.ai/kiloclaw/bytes/seo-audit-fixer" rel="noopener noreferrer"&gt;SEO Mechanic&lt;/a&gt; finds everything that's broken&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rewrite&lt;/strong&gt; — &lt;a href="https://kilo.ai/kiloclaw/bytes/product-description-factory" rel="noopener noreferrer"&gt;Product Description Factory&lt;/a&gt; generates new copy for every SKU&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edit&lt;/strong&gt; — &lt;a href="https://kilo.ai/kiloclaw/bytes/bulk-product-editor" rel="noopener noreferrer"&gt;Bulk Product Surgeon&lt;/a&gt; handles mass changes across the catalog&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Images&lt;/strong&gt; — &lt;a href="https://kilo.ai/kiloclaw/bytes/product-image-optimizer" rel="noopener noreferrer"&gt;Image Factory&lt;/a&gt; cleans up and optimizes every product photo&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distribute&lt;/strong&gt; — &lt;a href="https://kilo.ai/kiloclaw/bytes/multi-channel-lister" rel="noopener noreferrer"&gt;Listing Broadcaster&lt;/a&gt; pushes adapted listings to every channel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reviews&lt;/strong&gt; — &lt;a href="https://kilo.ai/kiloclaw/bytes/review-harvester" rel="noopener noreferrer"&gt;Review Loop&lt;/a&gt; collects social proof and monitors feedback&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each step works on its own. Together, they're a catalog overhaul that would have taken a team weeks, finished in an afternoon.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>ecommerce</category>
      <category>openclaw</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Trinity-Large-Thinking is Free in Kilo for a Limited Time</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Wed, 08 Apr 2026 12:58:37 +0000</pubDate>
      <link>https://forem.com/kilocode/trinity-large-thinking-is-free-in-kilo-for-a-limited-time-19a</link>
      <guid>https://forem.com/kilocode/trinity-large-thinking-is-free-in-kilo-for-a-limited-time-19a</guid>
      <description>&lt;p&gt;If you have been watching the OSS space, you know that the frontier is shifting from simple chat models to complex, reasoning-heavy agents. Last week, the team at Arcee AI made a massive contribution to that shift. They officially &lt;a href="https://www.arcee.ai/blog/trinity-large-thinking" rel="noopener noreferrer"&gt;launched Trinity-Large-Thinking&lt;/a&gt;, a frontier open reasoning model built specifically for complex, long-horizon agents and multi-turn tool calling.&lt;/p&gt;

&lt;p&gt;To celebrate the release of one of the strongest open models ever released outside of China, we are thrilled to announce that &lt;strong&gt;Trinity-Large-Thinking will be completely FREE to use in Kilo Code and KiloClaw for a full week, starting today, April 6th.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I know we've been launching a lot of models lately, but we're extra excited about this powerful new release from a lesser-known US lab. It's laser fast and great at a wide range of agentic tasks.&lt;/p&gt;

&lt;p&gt;Here is a quick breakdown of why this model is a game-changer for your daily workflow, and why you should test drive it ASAP.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5h8r2fe1oev7gsc0gst1.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5h8r2fe1oev7gsc0gst1.jpeg" alt="Trinity Large Thinking Benchmarks" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture: Massive Scale, Insane Efficiency
&lt;/h2&gt;

&lt;p&gt;Usually, when you hear about a 400-billion parameter model, you immediately worry about latency. Arcee solved this through architectural constraint and innovative thinking about how to optimize every part of the inference process.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sparse MoE Design:&lt;/strong&gt; Trinity-Large-Thinking is a 398B-parameter sparse Mixture-of-Experts (MoE) model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Active Parameters:&lt;/strong&gt; During inference, it activates only about 13B parameters per token.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Speed Advantage:&lt;/strong&gt; Because of this extreme sparsity, it possesses the deep knowledge of a massive system but runs roughly &lt;strong&gt;2 to 3 times faster than its peers&lt;/strong&gt; on the same hardware.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Agentic Edge: Perfect for KiloClaw
&lt;/h2&gt;

&lt;p&gt;The preview release of this model, Trinity Large Preview, has been free in Kilo for over two months and quickly rose to the top of the &lt;a href="https://openrouter.ai/apps?url=https%3A%2F%2Fkilocode.ai%2F" rel="noopener noreferrer"&gt;OpenRouter leaderboards&lt;/a&gt; for both Kilo Code (including KiloClaw) and OpenClaw.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frv30gp4vpqd5y6kqxb9k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frv30gp4vpqd5y6kqxb9k.png" alt="OpenRouter leaderboard" width="800" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The preview version of Trinity Large has been in Kilo's top 20 for over two months. (Snapshot is from past 30 days.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And that was just the &lt;em&gt;preview&lt;/em&gt;. While Trinity Large's architecture natively supports context windows up to 512k tokens, the Preview API served at 128k context using 8-bit quantization. &lt;strong&gt;Now you can use the full release for free, with a longer context that supports multiple turns.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Trinity-Large-Thinking wasn't built to ace trivia benchmarks. It was purpose-built for tool calling, multi-step planning, and agent workflows. &lt;strong&gt;This makes it an absolute monster when plugged into agentic features like &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt; (our hosted OpenClaw environment).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/pinchbench/status/2040885242756780235?s=20" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvay4oeflnfs50s30y99v.png" alt="PinchBench results" width="800" height="731"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Why is this new Trinity model so good for agentic use cases?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Native Reasoning Traces:&lt;/strong&gt; The model generates explicit reasoning traces before producing its final response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context is Key:&lt;/strong&gt; This internal thinking process is critical to the model's performance. When running agentic loops in OpenClaw, these thinking tokens must be kept in context for multi-turn conversations to function correctly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Massive Memory:&lt;/strong&gt; To support these long reasoning chains across many agentic steps, the model boasts a longer extended context window. It's particularly good at multi-turn tool use, context coherence, and instruction following across long-horizon agent runs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Top of the PinchBench Index
&lt;/h2&gt;

&lt;p&gt;We don't just take a lab's word for it; we look at the data. Our internal testing has found the model strong across &lt;a href="https://kilo.ai/kiloclaw/openclaw-for" rel="noopener noreferrer"&gt;OpenClaw use cases&lt;/a&gt; in KiloClaw.&lt;/p&gt;

&lt;p&gt;Arcee built this model focusing on the things that make agents feel real in practice: staying coherent across turns, using tools cleanly, and strictly following instructions.&lt;/p&gt;

&lt;p&gt;The results speak for themselves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Top-Tier Performance:&lt;/strong&gt; Initial testing saw Trinity Large Thinking rise to #2 on &lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt;, a benchmark measuring model capability on tasks relevant to agents like OpenClaw.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Heavyweight Challenger:&lt;/strong&gt; It sits just behind Claude Opus-4.6 in raw agentic capability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unbeatable Economics:&lt;/strong&gt; While rivaling Opus-4.6, it lands at just $0.90 per million output tokens on Arcee's API, making it roughly &lt;strong&gt;96% cheaper&lt;/strong&gt;. (Plus it's currently free in Kilo — that's pretty affordable!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At Kilo, we believe in avoiding vendor lock-in. Arcee shares that philosophy. They release model weights on Hugging Face under the Apache 2.0 license, and this has been true for &lt;a href="https://www.arcee.ai/trinity" rel="noopener noreferrer"&gt;all of their models&lt;/a&gt;. They built Trinity Large because they believe developers and enterprises need models they can inspect, post-train, host, distill, and truly own.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqo61bm640r3bdx9yj50.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqo61bm640r3bdx9yj50.png" alt="Arcee Apache 2.0" width="800" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Try it today in our CLI, IDE extensions, and agentic features like Kilo's &lt;a href="https://kilo.ai/features" rel="noopener noreferrer"&gt;Cloud Agents&lt;/a&gt; and &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt;. You'll be glad you did.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>coding</category>
    </item>
    <item>
      <title>PinchBench v2: Call for Contributors to the Leading OpenClaw Benchmark</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Tue, 31 Mar 2026 14:07:30 +0000</pubDate>
      <link>https://forem.com/kilocode/pinchbench-v2-call-for-contributors-to-the-leading-openclaw-benchmark-3m4d</link>
      <guid>https://forem.com/kilocode/pinchbench-v2-call-for-contributors-to-the-leading-openclaw-benchmark-3m4d</guid>
      <description>&lt;p&gt;We're excited to announce that &lt;strong&gt;PinchBench v2&lt;/strong&gt; is now in active development --- and we're opening the doors for community contributions to help shape the next major release. 🦀&lt;/p&gt;

&lt;h2&gt;
  
  
  The Remarkable Rise of PinchBench
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt; started as a side project of Kilo DevRel mastermind &lt;a href="https://x.com/olearycrew" rel="noopener noreferrer"&gt;Brendan O'Leary&lt;/a&gt;, who wanted to build a benchmarking system for evaluating LLM models as OpenClaw coding agents. The idea was simple: run tests based on real-world tasks to help users choose the right model for their use case. But my oh my, has that "side project" taken off!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk56b2yuun552zuwb984d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk56b2yuun552zuwb984d.png" width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;NVIDIA GTC Keynote 2026&lt;/p&gt;

&lt;p&gt;During his recent keynote, NVIDIA CEO Jensen Huang showcased PinchBench on stage as a definitive standard for evaluating the real-world performance of OpenClaw agents. He highlighted &lt;a href="https://blog.kilo.ai/p/nvidia-nemotron-3-super-launch" rel="noopener noreferrer"&gt;Nemotron 3 Super&lt;/a&gt;'s performance as the top open-weight model for OpenClaw use cases.&lt;/p&gt;

&lt;p&gt;In the following week, MiniMax has announced that they will soon release the weights for &lt;a href="https://blog.kilo.ai/p/minimax-m27" rel="noopener noreferrer"&gt;MiniMax-M2.7&lt;/a&gt;, and Z AI has shared that the much-anticipated GLM-5.1 will also have open weights. The competition is heating up, and not just for OSS models. This is only the beginning of the agentic revolution.&lt;/p&gt;

&lt;p&gt;We need your help to make PinchBench even more useful and comprehensive. The era of generalized benchmarks is over. &lt;strong&gt;It's time for benchmarks that help you choose the best LLMs for always-on agents&lt;/strong&gt;, with a focus on specific skills that can be used around the clock in tools like &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmftch8rprlc7ohiau7b8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmftch8rprlc7ohiau7b8.png" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;NVIDIA GTC Keynote 2026 (Full Screen!)&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What We're Building&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt; v2 is a significant leap forward. Our aim is to produce a benchmark that more accurately captures the real-world complexity of agentic tasks --- including longer task horizons, better verification, and a much richer picture of model performance across a wider set of domains. As &lt;a href="https://blog.kilo.ai/p/kiloclaw-updates-persistent-packages" rel="noopener noreferrer"&gt;KiloClaw continues to lead the charge&lt;/a&gt; for hosted OpenClaw ease-of-use, functionality and security, we want to make sure that PinchBench is equally ahead of the curve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Our goal for v2 is 100 tasks&lt;/strong&gt;, and we're especially focused on testing across a wider range of OpenClaw use cases. We want contributions that reflect the kinds of tasks OpenClaw is actually being used for in practice, paired with rigorous success-rate measurement. If you're running OpenClaw in production or research contexts, you're exactly who we want to hear from.&lt;/p&gt;

&lt;p&gt;On the leaderboard side, we're investing in a substantially improved UI/UX --- better filtering, model landing pages, user profiles, per-task variance, and more --- to make results easier to understand and compare.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fee7aroy2hr1t5dgza633.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fee7aroy2hr1t5dgza633.png" width="800" height="594"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Open Call for Contributions&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The contribution window is open now through April 15th, 2026.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We are looking for two types of contributors: skills and leaderboard. You are welcome to contribute in both categories.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Skills Contributions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Help us expand and improve the task suite:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;New tasks&lt;/strong&gt; --- What should OpenClaw be doing that we aren't currently measuring? We want tasks that represent real, valuable work: things a practitioner would actually run OpenClaw on, with clear and programmatically verifiable success criteria. Tasks should be relevant across both local and hosted OpenClaw instances --- including hosted services like &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt; and KimiClaw.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Task improvements&lt;/strong&gt; --- Some existing tasks fail at high rates across nearly all models, and others may not reflect the current state of what OpenClaw can do. If you can identify, fix, or replace tasks that aren't pulling their weight, we want your PR.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Success rate coverage&lt;/strong&gt; --- Contributions that include baseline success rates across multiple models are especially valuable. Help us ensure the benchmark is neither too easy nor impossibly hard at release. It's all about real-world agentic use.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Good tasks should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Realistic&lt;/strong&gt; --- something OpenClaw would genuinely be run on in a real workflow&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Clearly specified&lt;/strong&gt; --- a passing solution should unambiguously satisfy the task&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Well-calibrated in difficulty&lt;/strong&gt; --- ideally targeting a solve rate that distinguishes model capability&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Convention-compliant&lt;/strong&gt; --- all tasks must follow OpenClaw skill conventions to ensure consistency and compatibility across the benchmark&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Leaderboard Contributions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Help us build a leaderboard that's detailed, clear, relevant and accessible.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7h1w89c54gb0oaewxc9s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7h1w89c54gb0oaewxc9s.png" width="800" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We're working through a range of UI/UX improvements for v2, including redesigned filtering and navigation, model and contributor profile pages, improved scoring to eliminate run-size bias, and daily/weekly/monthly recognition badges. If you have front-end chops and care about how benchmark results are communicated, this is where we need you.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to Contribute&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;There are no forms to fill out. Anybody can contribute.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Review the open issues in the &lt;a href="https://github.com/pinchbench/skill/issues/60" rel="noopener noreferrer"&gt;PinchBench v2 meta issue&lt;/a&gt; to understand what's in scope&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Propose a new task or improvement in GitHub Discussions or by opening an issue --- especially for OpenClaw-specific use cases you want to see covered&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement your contribution by forking the repo, building it out, and submitting a PR&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Iterate with reviewers to get your contribution merged&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Recognition&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Contributors will be recognized in the v2 release in two categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Skills Contributors&lt;/strong&gt; --- recognized for accepted new tasks and task improvements, ordered by number of accepted contributions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Leaderboard Contributors&lt;/strong&gt; --- recognized for accepted UI/UX improvements to the leaderboard&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every accepted contribution counts. Whether it's one well-crafted task or a full leaderboard feature, we aim to acknowledge top community contributions in the release.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Get Involved&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GitHub: &lt;/strong&gt;&lt;a href="https://github.com/pinchbench/skill" rel="noopener noreferrer"&gt;pinchbench/skill&lt;/a&gt; --- browse open issues and the v2 meta issue&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;v2 Meta Issue: &lt;/strong&gt;&lt;a href="https://github.com/pinchbench/skill/issues/60" rel="noopener noreferrer"&gt;#60&lt;/a&gt; --- the full list of what's in scope for this release&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PinchBench is a community project, and v2 will be shaped by the people who contribute to it. We'd love your help in improving the definitive benchmark for OpenClaw use cases. Learn more about &lt;a href="https://pinchbench.com/about" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt; and &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>contributorswanted</category>
      <category>openclaw</category>
    </item>
    <item>
      <title>The Cost of Always-On Agents is Less Than You Might Think</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Tue, 31 Mar 2026 14:04:26 +0000</pubDate>
      <link>https://forem.com/kilocode/the-cost-of-always-on-agents-is-less-than-you-might-think-ho4</link>
      <guid>https://forem.com/kilocode/the-cost-of-always-on-agents-is-less-than-you-might-think-ho4</guid>
      <description>&lt;p&gt;There's a growing assumption in AI right now:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If agents are always running, costs will spiral.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This sounds reasonable. More autonomy should mean more tokens and more compute. More tokens and more compute should mean higher bills.&lt;/p&gt;

&lt;p&gt;But that mental model is already breaking. Why? Because it assumes you're paying for &lt;strong&gt;outputs&lt;/strong&gt;---individual prompts and responses.&lt;/p&gt;

&lt;p&gt;In reality, with new agentic systems like OpenClaw, you're paying for something very different:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Ongoing throughput---work completed over time.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once we understand that shift---the move from prompts and specific outputs to a model that focuses on ongoing throughput and persistent memory---the economics start to look completely different.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3c9e4qgz6ytbn5u9pt7n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3c9e4qgz6ytbn5u9pt7n.png" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Outdated Way to Think About Cost&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most teams still evaluate AI like an API:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Cost per token&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost per request&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost per response&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That might work for chat, but it fails for agents. Agents don't just respond once. Instead, they plan, break work into steps, execute across tools, revisit and improve outputs, and (if everything is working correctly) they continue operating after the initial trigger.&lt;/p&gt;

&lt;p&gt;So the real question isn't "how much does this prompt cost?" but &lt;strong&gt;"how much useful work can I get done for a small amount of money?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faxflirgbtcm5t5sdkbx7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faxflirgbtcm5t5sdkbx7.png" width="800" height="484"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pinchbench.com/?view=cost" rel="noopener noreferrer"&gt;Filtering by cost&lt;/a&gt; in PinchBench&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What the Data Actually Shows&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Benchmarks like PinchBench measure something more meaningful than tokens: &lt;strong&gt;cost per completed agent task.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's a snapshot of &lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;current value rankings&lt;/a&gt;. A few things jump out immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;High-value models like Opus complete full tasks for &lt;strong&gt;$0.03--$0.13&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Even strong mid-tier models like Kimi K2.5 stay well under &lt;strong&gt;$0.50 per task&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Average*&lt;em&gt; success rates&lt;/em&gt;* cluster surprisingly close (65--85%) despite major cost differences&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This leads to a non-obvious conclusion:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You're often paying &lt;strong&gt;10--20x more&lt;/strong&gt; for marginal gains in quality.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fudeedh92bov3fp5mknkh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fudeedh92bov3fp5mknkh.png" width="800" height="666"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;Filtering by success rate&lt;/a&gt; in PinchBench&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What $10 Gets You in OpenClaw&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We often have free models like &lt;a href="https://kilo.ai/leaderboard" rel="noopener noreferrer"&gt;Nemotron 3 Super, Trinity Large Preview and MiMo-V2-Pro&lt;/a&gt; available in Kilo, but even if you're opting for paid models, you can get a LOT for $10. A ten-spot will buy you a lot more than 10 turns in your agent chat.&lt;/p&gt;

&lt;p&gt;Let's translate those numbers into something real.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Without Agents: Linear Output&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you're coding or prompting manually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You rely on frontier models&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You resend context every time&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You manually trigger every step&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Work stops when you stop&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;$10 gets you around 2--4 meaningful tasks. &lt;/strong&gt;Then it's on to the next project.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;With KiloClaw: Compounding Output&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With a hosted OpenClaw agent like KiloClaw, that same $10 is distributed across a system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;sub-agents handling different responsibilities&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;multiple model tiers with different costs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cached context reused across runs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;scheduled execution instead of constant prompting&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In KiloClaw, &lt;strong&gt;$10 gets you around 20--150+ agent task executions.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Of course there's some variance depending on which &lt;a href="https://kilo.ai/kiloclaw/bytes" rel="noopener noreferrer"&gt;tasks and skills&lt;/a&gt; you're focused on. But still. This is huge. And it's honestly a lot more than we were expecting when we started spinning up claws.&lt;/p&gt;

&lt;p&gt;More importantly, &lt;em&gt;the system keeps working after you stop&lt;/em&gt;. Sub-agents reduce waste, memory persists, and &lt;strong&gt;auto&lt;/strong&gt; &lt;strong&gt;model routing can further decrease costs by 5-10x&lt;/strong&gt;. Most agentic tasks don't actually need the "best" model. With auto routing now available in different modes in Kilo, including in KiloClaw, you can pick a mode during onboarding and update at any time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0bhdbfe0wy48yn7hlfzi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0bhdbfe0wy48yn7hlfzi.png" width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Current Kilo Auto Modes. Models and modes subject to change!&lt;/p&gt;

&lt;p&gt;Looking to take advantage of high-efficiency models but super powerful models like &lt;a href="https://blog.kilo.ai/p/what-we-learned-from-a-week-of-free" rel="noopener noreferrer"&gt;Kimi K2.5&lt;/a&gt; and &lt;a href="https://blog.kilo.ai/p/we-tested-minimax-m27-against-claude" rel="noopener noreferrer"&gt;MiniMax M2.7&lt;/a&gt;? Choose &lt;strong&gt;Balanced Mode&lt;/strong&gt; and we'll route between models for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why "Agentic Engineering" Was Inevitable&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This isn't just a cost story. It's a shift in how software gets built, whether that's full production software for a new startup or your own personal AI assistant with something like &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We're entering the era of &lt;strong&gt;agentic engineering&lt;/strong&gt;---where multiple agents collaborate across planning, implementation, debugging, and deployment.&lt;/p&gt;

&lt;p&gt;This isn't hype. It's already happening:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Code gets written, reviewed, and deployed in a single loop&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Long-running tasks move into persistent cloud agents&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Developers supervise systems instead of executing every step&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The role of the developer is changing---from builder to orchestrator. And with OpenClaw the role of everyday AI users is changing too---from consumer to conductor.&lt;/p&gt;

&lt;p&gt;And once that happens, cost behaves differently. Efficiency is no longer about a single request---it's about how well the system runs over time.&lt;/p&gt;

&lt;p&gt;Platforms that unify this workflow---IDE, CLI, cloud, and collaboration---don't just improve productivity. They become the default interface for building software. This is what we've been building at Kilo since the beginning, and the rise of KiloClaw is just the next phase of this (very fast) evolution.&lt;/p&gt;

&lt;p&gt;Check out &lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt; for the best OpenClaw benchmarks, and &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;launch your own claw &lt;/a&gt;in minutes with Kilo! 🦀&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>ai</category>
      <category>agents</category>
      <category>discuss</category>
    </item>
    <item>
      <title>We Tested MiniMax M2.7 Against Claude Opus 4.6</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Wed, 25 Mar 2026 08:11:33 +0000</pubDate>
      <link>https://forem.com/kilocode/we-tested-minimax-m27-against-claude-opus-46-1ii9</link>
      <guid>https://forem.com/kilocode/we-tested-minimax-m27-against-claude-opus-46-1ii9</guid>
      <description>&lt;p&gt;&lt;a href="https://www.minimax.io/models/text/m27" rel="noopener noreferrer"&gt;MiniMax M2.7&lt;/a&gt; launched on March 18 scoring 56.22% on SWE-Pro, close to Claude Opus 4.6. We ran both models through three coding tasks in &lt;a href="https://kilocode.ai/" rel="noopener noreferrer"&gt;Kilo Code&lt;/a&gt; to see if the benchmark numbers hold up in practice. On pricing, MiniMax M2.7 runs at $0.30/$1.20 per million tokens (input/output) compared to Claude Opus 4.6's $5/$25, roughly a &lt;strong&gt;17x difference on input and 21x on output&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2F%24s_%21emdv%21%2Cw_1456%2Cc_limit%2Cf_auto%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252Ff61f6e60-9bc5-4d4d-8f85-3bd602ff54cc_3000x1490.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2F%24s_%21emdv%21%2Cw_1456%2Cc_limit%2Cf_auto%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252Ff61f6e60-9bc5-4d4d-8f85-3bd602ff54cc_3000x1490.jpeg" title="Value Icon" alt="Value Icon" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Both models found &lt;strong&gt;all 6 bugs and all 10 security vulnerabilities&lt;/strong&gt; in our tests. Claude Opus 4.6 produced more thorough fixes and 2x more tests. MiniMax M2.7 delivered &lt;strong&gt;90% of the quality for 7% of the cost&lt;/strong&gt; ($0.27 total vs $3.67).&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Test Design&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We created three TypeScript codebases and ran both models in Code mode in &lt;a href="https://kilocode.ai/" rel="noopener noreferrer"&gt;Kilo Code&lt;/a&gt; for VS Code. Each model received the same prompt with no hints. We scored each model independently after all tests were complete.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test 1: Full-Stack Event Processing System (35 points)&lt;/strong&gt; - Build a complete system from a spec, including async pipeline, WebSocket streaming, and rate limiting&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test 2: Bug Investigation from Symptoms (30 points)&lt;/strong&gt; - Trace 6 bugs from production log output to root causes and fix them&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test 3: Security Audit (35 points)&lt;/strong&gt; - Find and fix 10 planted security vulnerabilities across a team collaboration API&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Test 1: Full-Stack Event Processing System&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We gave both models this prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Build a real-time event processing system in TypeScript from the specification in &lt;a class="mentioned-user" href="https://dev.to/spec"&gt;@spec&lt;/a&gt;.md. Use Hono for the web framework, Prisma with SQLite for the database, Zod for input validation, and ws for WebSocket support."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The spec required 7 components: event ingestion API with API key auth, async processing pipeline with exponential backoff retry, event storage with processing history, query API with pagination and filtering, WebSocket endpoint for live streaming, per-key rate limiting, and health/metrics endpoints.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2kvoye9dqs2c9csvi9b6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2kvoye9dqs2c9csvi9b6.png" width="800" height="207"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both models implemented all 7 components. The score difference came from code organization and test coverage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77xum11232j9x0urywi7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77xum11232j9x0urywi7.png" width="800" height="591"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Architecture&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Claude Opus 4.6 created a modular directory structure with separate directories for routes, pipeline, middleware, and WebSocket management. It split the processing logic into separate files for queue management (with retry scheduling and dead-letter routing) and per-type event handlers. It also included graceful shutdown with timer cleanup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3lvr4yh983l2shmvxzf9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3lvr4yh983l2shmvxzf9.png" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MiniMax M2.7 used a flatter structure with fewer files. All routing lived in a single entry file, and the processor was simpler with no shutdown management or timer tracking.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F62bjlbqzkq7mvm83yovy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F62bjlbqzkq7mvm83yovy.png" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Test Coverage&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Claude Opus 4.6 wrote &lt;strong&gt;41 integration tests&lt;/strong&gt; with a dedicated test database and proper cleanup between tests. The tests make actual HTTP requests against the API, testing the full middleware chain end-to-end.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8uufcyn7orra8mhf2mhp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8uufcyn7orra8mhf2mhp.png" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MiniMax M2.7 wrote &lt;strong&gt;20 unit tests&lt;/strong&gt; that validate Zod schemas and handler functions directly. These cover the core logic, but don't test the API endpoints or middleware through HTTP, so routing or middleware bugs would slip through.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Test 1 Scoring&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwyz1f7sm02wg13ef6om9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwyz1f7sm02wg13ef6om9.png" width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Claude Opus 4.6 lost 2 points for not generating a README (the spec asked for one). MiniMax M2.7 generated a README but lost points on architecture and test coverage.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Test 2: Bug Investigation from Symptoms&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We built an order processing system with 4 interconnected modules (gateway, orders, inventory, notifications) and planted 6 bugs. We gave both models the codebase, a production log file showing symptoms, and a memory profile showing growth data. The prompt listed the 6 symptoms and asked both models to investigate, find root causes, and fix them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujbhda7qbl5zo6na0pif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujbhda7qbl5zo6na0pif.png" width="800" height="190"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both models found all 6 root causes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6iaduf67jpl5qg602a9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6iaduf67jpl5qg602a9.png" width="800" height="576"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Bug #1: Race Condition in Inventory&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Stock was checked first, then reserved in a separate transaction. Two concurrent orders could both pass the check before either reserved. Both models identified this from the logs and fixed it by making the reservation atomic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; also added &lt;strong&gt;rollback logic&lt;/strong&gt;. If reserving stock for one item in a multi-item order fails, it releases the items that already succeeded and marks the order as "failed." &lt;strong&gt;MiniMax M2.7&lt;/strong&gt; made the reservation atomic but &lt;strong&gt;didn't add rollback&lt;/strong&gt;, so partial failures can leave orphaned reservations.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Bug #4: Floating-Point Totals&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The order total calculation used standard floating-point arithmetic, which produces results like &lt;code&gt;159.92000000000002&lt;/code&gt; for certain price and quantity combinations. The logs showed repeated "Total validation warning" entries where the expected and calculated totals differed by tiny fractions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; rounded the result after calculation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faoudpir0fdlegazkhi14.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faoudpir0fdlegazkhi14.png" width="800" height="130"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MiniMax M2.7&lt;/strong&gt; converted to integer math (cents), avoiding the precision problem entirely:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2s89y5yyau96uvm8yofj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2s89y5yyau96uvm8yofj.png" width="800" height="181"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MiniMax M2.7's approach is technically better here. Working in cents avoids accumulation errors that rounding after the fact can miss on large orders.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Remaining Bugs&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Both models fixed the other 4 bugs with the same approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Notification ordering (Bug #2)&lt;/strong&gt;: Added a status check before sending confirmation emails, skipping orders that were already cancelled&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory leak (Bug #3)&lt;/strong&gt;: Removed a per-order event listener that was never cleaned up, accumulating with each request (the memory profile showed listener count tracking 1:1 with request count)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Stale inventory cache (Bug #5)&lt;/strong&gt;: Added cache invalidation calls after stock updates, so the 60-second cache TTL no longer serves stale data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Token revocation bypass (Bug #6)&lt;/strong&gt;: Removed a "5-minute optimization" that skipped the revocation check for fresh tokens&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Test 2 Scoring&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhib87faubskwwox5qxjc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhib87faubskwwox5qxjc.png" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both models verified their fixes by running curl requests against the server. Claude Opus 4.6 explicitly referenced log entries when explaining each bug, while MiniMax M2.7 jumped more directly to the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Test 3: Security Audit&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We built a team collaboration API (Hono + Prisma + SQLite) with 10 planted security vulnerabilities. We asked both models to audit the codebase, categorize each vulnerability by OWASP, explain the attack vector, rate severity, and implement fixes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpoj90n8xocvoozh1ypxp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpoj90n8xocvoozh1ypxp.png" width="800" height="239"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both models found all 10 vulnerabilities with correct OWASP categorizations. The 4-point gap is entirely in fix quality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6rotps5o5me4x5bhz4r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6rotps5o5me4x5bhz4r.png" width="800" height="597"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Where the Fixes Diverged&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Password hashing&lt;/strong&gt;: Claude Opus 4.6 used scrypt with random salts and timing-safe comparison. MiniMax M2.7 used SHA-256 with the JWT secret as the salt, and flagged in its own output that bcrypt would be better.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Insecure deserialization&lt;/strong&gt;: Both removed the &lt;code&gt;eval()&lt;/code&gt; on webhook transforms. Claude Opus 4.6 replaced it with a safe JSON key-mapping system. MiniMax M2.7 disabled transforms entirely.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SSRF protection&lt;/strong&gt;: Claude Opus 4.6 validated webhook URLs at creation, update, and delivery. MiniMax M2.7 validated at delivery only.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rate limiting&lt;/strong&gt;: Claude Opus 4.6 applied per-endpoint limits (login, register, password reset). MiniMax M2.7 only rate-limited the login endpoint.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JWT fix&lt;/strong&gt;: Both moved the hardcoded secret to an environment variable. Claude Opus 4.6 let &lt;code&gt;jwt.verify()&lt;/code&gt; handle expiration natively. MiniMax M2.7 fixed the broken manual comparison, which works but duplicates built-in functionality.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Test 3 Scoring&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvnz6smk55y9ran7r9xkg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvnz6smk55y9ran7r9xkg.png" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Overall Results&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9eujwor18p5q1s46nojo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9eujwor18p5q1s46nojo.png" width="800" height="248"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Bigger Picture&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We've been testing MiniMax models since M2 last November. Earlier versions competed against other open-weight models like GLM 4.7 and GLM-5. With each release, the scores climbed and the cost stayed low. MiniMax M2.5 (the previous version) is currently the #1 most-used model across every mode in Kilo Code, ahead of Claude Opus 4.6, GLM-5, and GPT-5.4. In Code mode it accounts for 37% of all usage. In Ask mode, 35%.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fib5suxlnvt5jkt0a0ytv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fib5suxlnvt5jkt0a0ytv.png" width="800" height="357"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MiniMax M2.5 usage across Kilo Code modes&lt;/p&gt;

&lt;p&gt;MiniMax M2.7 is the first version where we felt the right comparison was a frontier model rather than another open-weight one. It matched Claude Opus 4.6's detection rate on every test in this benchmark, finding the same bugs and the same vulnerabilities. The fixes aren't as thorough yet, but the diagnostic gap between open-weight and frontier models is shrinking with every release.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Takeaways&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;For building from scratch&lt;/strong&gt;: Claude Opus 4.6 produced 41 integration tests and a modular architecture. MiniMax M2.7 built the same features with 20 unit tests and a flatter structure, at $0.13 vs $1.49.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For debugging&lt;/strong&gt;: Both models found all 6 root causes from log symptoms. MiniMax M2.7 even produced a better fix for the floating-point bug. Claude Opus 4.6 added rollback logic that MiniMax M2.7 missed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For security work&lt;/strong&gt;: Both models found all 10 vulnerabilities. Claude Opus 4.6's fixes are closer to what you'd ship (proper key derivation, feature-preserving alternatives, defense-in-depth). MiniMax M2.7 closes the same vulnerabilities with simpler approaches and sometimes flags its own shortcuts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On cost&lt;/strong&gt;: $3.67 total for Claude Opus 4.6 vs $0.27 for MiniMax M2.7. Detection was identical. The gap is in how thorough the fixes are.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>coding</category>
      <category>llm</category>
      <category>testing</category>
    </item>
    <item>
      <title>Talk to the Claw: The Interface Is Now a Single Sentence</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Fri, 20 Mar 2026 09:00:00 +0000</pubDate>
      <link>https://forem.com/kilocode/talk-to-the-claw-the-interface-is-now-a-single-sentence-h3</link>
      <guid>https://forem.com/kilocode/talk-to-the-claw-the-interface-is-now-a-single-sentence-h3</guid>
      <description>&lt;p&gt;We hear it a lot these days, but what does it actually mean for software to have a "new interface"?&lt;/p&gt;

&lt;p&gt;At Kilo, we aren't approaching this question in the abstract---we're living it every day.&lt;/p&gt;

&lt;p&gt;As we lean into agentic flows, we're discovering that working in a new interface means that the layer between you and the tool is no longer a dashboard, a form, or a button. &lt;strong&gt;It's a sentence.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You will still hear people talk about UX improvements. Better navigation. Cleaner design. More intuitive onboarding flows. It will be framed as progress.&lt;/p&gt;

&lt;p&gt;But the real change runs deeper than any redesign. The interface layer is decoupling from the application layer entirely. You don't need to know where the button is. You don't need to learn the menu structure. You just say what you need done.&lt;/p&gt;

&lt;p&gt;Natural language &lt;em&gt;is&lt;/em&gt; the new UI.&lt;/p&gt;

&lt;p&gt;A couple of things I'm not saying:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;I'm not saying every app will disappear.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I'm not saying this works perfectly today for every use case.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I'm not saying you should throw away your existing workflows.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here's what I &lt;em&gt;am&lt;/em&gt; saying: the apps you already use didn't have to rebuild themselves from scratch for this to be true. &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt; can talk to &lt;a href="https://www.todoist.com/" rel="noopener noreferrer"&gt;Todoist&lt;/a&gt; &lt;em&gt;and&lt;/em&gt; &lt;a href="https://linear.app/" rel="noopener noreferrer"&gt;Linear&lt;/a&gt; &lt;em&gt;and&lt;/em&gt; your calendar &lt;em&gt;and&lt;/em&gt; your inbox -- through the same window, using the same language you'd use to text a colleague. You don't have to live inside each one to operate them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr114rkdesbwepqs820jo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr114rkdesbwepqs820jo.png" width="800" height="745"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Credit: Todoist&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This isn't about saving five minutes. It's about a bigger shift. The way we interact with software is fundamentally changing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Twelve Tools, One Front Door
&lt;/h2&gt;

&lt;p&gt;Here's where the new interface really shines.&lt;/p&gt;

&lt;p&gt;Last week, I had a new project land in my inbox. I downloaded the PDF, uploaded it to my &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw bot on Telegram&lt;/a&gt;, and typed a simple prompt in natural language, essentially: &lt;em&gt;Create a Todoist project for this and add the tasks based on these guidelines.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's it. No excessive bulleted lists. No diagrams. No long paragraphs discussing the background and goals for this project. Just a couple of sentences.&lt;/p&gt;

&lt;p&gt;Thirty seconds later, it was done.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffu0f1gun6xa2fqy4j703.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffu0f1gun6xa2fqy4j703.png" width="800" height="793"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the same thing with scheduling.&lt;/p&gt;

&lt;p&gt;I was meeting with a friend and colleague and we agreed to sync again the following week. We both pulled up our calendars, found a time. I sent a message to KiloClaw. My contact received a calendar invite a minute later.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqj047b40ykwlw364v6if.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqj047b40ykwlw364v6if.png" width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two different tools. Two different workflows. One conversation.&lt;/p&gt;

&lt;p&gt;Here's the thing: Todoist actually has a feature for this. It's called Ramble -- you can talk to it, describe your project, and it populates tasks for you. That's cool. But that's not the unlock I'm talking about.&lt;/p&gt;

&lt;p&gt;I'm the kind of person who has a different tool for everything. Todoist for tasks. GitHub for engineering projects. &lt;a href="https://kilo.ai/slack" rel="noopener noreferrer"&gt;Slack for team communication&lt;/a&gt;. Gmail for email. Each tool lives in its own silo, with its own interface, its own learning curve, its own quirks.&lt;/p&gt;

&lt;p&gt;The problem has never been the tools.&lt;/p&gt;

&lt;p&gt;The problem is the twelve different front doors. With a unified interface that acts on natural language, we now have a single way into the house.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The New Interface is the Front Door We Always Needed&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Count the apps you opened before lunch today.&lt;/p&gt;

&lt;p&gt;Email. Slack. Calendar. Linear. Todoist.&lt;/p&gt;

&lt;p&gt;They're all like different doors into your life, each with its own login, its own layout, its own way of asking you to do the same basic thing: move information from your head into the right place.&lt;/p&gt;

&lt;p&gt;That tax -- the constant context-switching, the re-orienting, the "where does this live?" -- is so familiar that most of us stopped noticing it.&lt;/p&gt;

&lt;p&gt;We got so used to micro context-switching that we forgot there could be a better way.&lt;/p&gt;

&lt;p&gt;Curious?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here's what I recommend you to do to get started: &lt;/strong&gt;Start with one workflow you do repeatedly. Something tedious. Something where you're just copying information from one place to another. Tell &lt;a href="https://blog.kilo.ai/p/open-claw-is-my-intern" rel="noopener noreferrer"&gt;your bot&lt;/a&gt; to do it instead.&lt;/p&gt;

&lt;p&gt;You might be surprised how short the conversation needs to be.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How KiloClaw Is Built to Be Secure</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Thu, 19 Mar 2026 13:00:00 +0000</pubDate>
      <link>https://forem.com/kilocode/how-kiloclaw-is-built-to-be-secure-1e22</link>
      <guid>https://forem.com/kilocode/how-kiloclaw-is-built-to-be-secure-1e22</guid>
      <description>&lt;p&gt;OpenClaw has taken the world by storm and brought autonomous AI agents into the mainstream.&lt;/p&gt;

&lt;p&gt;However, when an agent can execute code, browse the web, and connect to Slack, Discord, or Telegram on your behalf, &lt;strong&gt;security stops being a footnote and starts being the whole plot.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This also raises an obvious question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I know my data, accounts, and API keys are safe?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That question sits at the center of how KiloClaw is built.&lt;/p&gt;

&lt;p&gt;KiloClaw is a managed compute platform for OpenClaw built with security at its core, and we just &lt;a href="https://kilo.codes/kiloclaw-security-whitepaper" rel="noopener noreferrer"&gt;published a detailed white paper&lt;/a&gt; describing the platform's security architecture and independent assessment.&lt;/p&gt;

&lt;p&gt;This post summarizes the key points from KiloClaw's February 2026 security white paper. Let's dive in.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;What makes KiloClaw different from your "typical OpenClaw hosting SaaS"&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;KiloClaw was designed from the ground up with &lt;strong&gt;defense in depth&lt;/strong&gt;: multiple layers of isolation, strong authentication, encrypted storage, and strict handling of customer secrets.&lt;/p&gt;

&lt;p&gt;And in February 2026, KiloClaw brought in an independent security assessor, &lt;a href="https://www.linkedin.com/in/andrewstorms/" rel="noopener noreferrer"&gt;Andrew Storms&lt;/a&gt;, for a 10 day review that included threat modeling, code review, adversarial testing, and live infrastructure testing. The conclusion: KiloClaw's security architecture is sound, with tenant isolation enforced at multiple independent layers.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Each instance is an isolated virtual machine&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Most SaaS products store customer data in shared infrastructure and rely on application logic to keep accounts separated.&lt;/p&gt;

&lt;p&gt;KiloClaw is different because your AI agent actually runs code on your behalf. That means the isolation between customers has to be much stronger.&lt;/p&gt;

&lt;p&gt;A simple way to think about it is this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Many platforms give you an apartment in a larger building.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;KiloClaw gives you your own house, on your own lot, with your own fence.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you create a KiloClaw instance, we spin up a dedicated virtual machine just for you.&lt;/p&gt;

&lt;p&gt;Not a shared container. Not a small slice of someone else's runtime. &lt;strong&gt;A full virtual machine with its own kernel.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;KiloClaw uses Firecracker microVMs, the same virtualization technology used by AWS Lambda and AWS Fargate. That matters because Firecracker provides isolation between workloads based on hardware virtualization.&lt;/p&gt;

&lt;p&gt;In practical terms, that means &lt;strong&gt;if something goes wrong inside your AI agent's environment, the impact is contained to your environment.&lt;/strong&gt; There is no shared kernel, no shared filesystem, and no shared process space with other customers.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Five layers of isolation&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;KiloClaw does not depend on one security layer. It uses five independent layers of tenant isolation.&lt;/p&gt;

&lt;p&gt;For one customer to access another customer's data, all five layers would have to fail at the same time.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Identity-based routing
&lt;/h3&gt;

&lt;p&gt;Every request is authenticated before it reaches a customer machine.&lt;/p&gt;

&lt;p&gt;KiloClaw does not route requests based on user-controlled input. Instead, it derives the destination from the authenticated user identity stored server-side.&lt;/p&gt;

&lt;p&gt;In plain English: &lt;strong&gt;you cannot trick the platform into sending you to someone else's machine.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. A dedicated application environment
&lt;/h3&gt;

&lt;p&gt;Each customer's VM runs inside a dedicated Fly.io application.&lt;/p&gt;

&lt;p&gt;That matters because KiloClaw keeps each customer's machines, storage, and internal network isolated from other customers.&lt;/p&gt;

&lt;p&gt;In practice, that means &lt;strong&gt;one customer's storage cannot be attached to another customer's machine,&lt;/strong&gt; and one customer's machines cannot be discovered through another customer's internal network.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Network isolation
&lt;/h3&gt;

&lt;p&gt;Each customer environment is placed on its own isolated WireGuard network mesh.&lt;/p&gt;

&lt;p&gt;During the independent assessment, live cross-tenant tests confirmed that customers could not discover one another's machines, could not connect directly across applications, and could only perform self-referencing network operations.&lt;/p&gt;

&lt;p&gt;So the separation exists at the network layer too, not just inside the app.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The Firecracker VM boundary
&lt;/h3&gt;

&lt;p&gt;Each customer workload runs inside its own Firecracker microVM.&lt;/p&gt;

&lt;p&gt;This is a hard isolation boundary based on hardware virtualization. Even if an AI agent were manipulated through prompt injection or malicious tool usage, the blast radius would still be limited to that customer's own VM.&lt;/p&gt;

&lt;p&gt;To escape that boundary would require a vulnerability in the Firecracker hypervisor itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Dedicated encrypted storage
&lt;/h3&gt;

&lt;p&gt;Each customer gets a dedicated persistent storage volume, and that volume is encrypted at rest.&lt;/p&gt;

&lt;p&gt;That storage can only be mounted inside the customer's own application environment. There is no path for another customer's machine to access it.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;What about my API keys?&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;This is one of the most common questions for a good reason.&lt;/p&gt;

&lt;p&gt;Customer API keys and chat tokens are encrypted in the platform database using RSA-OAEP with AES-256-GCM and are decrypted only when the customer's VM starts, where they are available inside that customer's isolated environment.&lt;/p&gt;

&lt;p&gt;In other words, &lt;strong&gt;your keys are stored in encrypted form by the platform and only become readable inside your own isolated environment when needed.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There is also more work planned here over time, including short-lived token exchange and in-memory secret stores to reduce the exposure window even further.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Your data is protected in transit and while stored&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;KiloClaw protects data both while it is moving and while it is stored.&lt;/p&gt;

&lt;p&gt;All external traffic uses TLS, including communication between the browser and the platform, as well as API calls to model providers.&lt;/p&gt;

&lt;p&gt;At rest, customer storage volumes are encrypted using disk encryption at the infrastructure level, and secrets stored in the database are encrypted separately using modern cryptography.&lt;/p&gt;

&lt;p&gt;In practical terms, that covers things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;API keys&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chat tokens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Session transcripts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Workspace files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Authentication-related secrets&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;What happens when I delete my instance?&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;When you destroy your KiloClaw instance, the platform runs a two-phase cleanup process designed to complete safely even if something fails midway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;, it shuts down and destroys the virtual machine. &lt;strong&gt;Then&lt;/strong&gt; it deletes the storage volume and removes runtime secrets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deleting the volume also destroys the underlying encryption keys, making the stored data unrecoverable.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The deletion flow was reviewed independently and confirmed to clean up sensitive data reliably, even if part of the process fails temporarily and has to retry.&lt;/p&gt;

&lt;p&gt;No secrets persist after instance destruction.&lt;/p&gt;

&lt;p&gt;Some metadata that is not sensitive, such as user identifiers or timestamps needed for operational auditing, may remain in records or logs that have been soft deleted. That is standard practice and separate from customer secrets or workspace data.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;What about prompt injection?&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;If you follow AI security, you have probably heard of prompt injection: a malicious message or webpage tries to trick an AI agent into doing something it should not do.&lt;/p&gt;

&lt;p&gt;KiloClaw addresses this in two important ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;, the agent's exec tool, the capability that lets it run shell commands, requires explicit user approval before execution by default. That setting is enforced by the platform itself and cannot be overridden by the agent, by prompt injection, or through a connected chat channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt;, even in a worst-case scenario where prompt injection changes the agent's behavior, the blast radius is still contained to your own VM. It cannot access another customer's resources or move sideways across tenants.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Authentication is designed to reject uncertainty&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;KiloClaw's authentication model includes several protections that matter in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Secure JWT cookie handling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Server-side&lt;/strong&gt; session revocation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Constant-time&lt;/strong&gt; secret comparison&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Behavior that rejects requests when required systems are unavailable&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one is especially important.&lt;/p&gt;

&lt;p&gt;If a required backend dependency is unavailable, KiloClaw rejects the request rather than falling back to weaker behavior. That is a strong security property and one of the details security teams look for.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;What we tested with an independent security assessor&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;In February 2026, independent assessor Andrew Storms conducted a &lt;strong&gt;10-day&lt;/strong&gt; security assessment of KiloClaw.&lt;/p&gt;

&lt;p&gt;That work included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Threat modeling using the PASTA framework, covering 30 threats across 13 assets&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Code review of routing, authentication, secret handling, and lifecycle management&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;35 adversarial tenant-isolation tests, including Unicode edge cases, zero-width characters, null bytes, and injection payloads&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;8 live cross-tenant network tests across separate customer environments&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dozens of adversarial command-injection payloads&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Review of the build pipeline, machine images, and runtime environment&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The results were strong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;No &lt;strong&gt;cross-tenant&lt;/strong&gt; access path was found&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No SQL injection findings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No XSS findings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No command injection findings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No path traversal findings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No open redirect findings&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The assessment also produced 17 merged pull requests: 10 security fixes and 7 hardening improvements.&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Security is an ongoing practice&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;One thing worth saying clearly: strong security does not mean pretending the work is done.&lt;/p&gt;

&lt;p&gt;The independent review found KiloClaw's architecture to be fundamentally sound, while also identifying areas for continued investment, especially around hardening the software supply chain and improving operational maturity.&lt;/p&gt;

&lt;p&gt;That roadmap includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Pinning base images to SHA-256 digests&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Image signing with Sigstore and cosign&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SBOM generation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automated vulnerability scanning in CI/CD&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Egress monitoring and interactive approval controls for outbound agent activity&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;The bottom line&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;KiloClaw was built with the understanding that managed compute for AI agents is a product category that demands a high level of trust and security.&lt;/p&gt;

&lt;p&gt;That is why each customer gets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A dedicated virtual machine&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A dedicated application environment&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;An isolated network&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dedicated encrypted storage (volume isolation)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Strong authentication and access controls&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Encrypted secret handling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple independent layers of tenant isolation&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For &lt;strong&gt;nontechnical&lt;/strong&gt; readers, the simplest takeaway is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your agent is not running in the same place as everyone else's. Your environment is isolated by design.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For technical buyers, the important point is that this was not just claimed. It was independently reviewed, tested against adversarial scenarios, and validated in live infrastructure testing.&lt;/p&gt;

&lt;p&gt;If you have questions about KiloClaw's security architecture or want additional documentation, reach out to the Kilo security team at security at kilocode dot ai.&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>security</category>
      <category>ai</category>
    </item>
    <item>
      <title>ClawCon Recap: NYC &amp; Austin</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Thu, 19 Mar 2026 08:53:36 +0000</pubDate>
      <link>https://forem.com/kilocode/clawcon-recap-nyc-austin-4apc</link>
      <guid>https://forem.com/kilocode/clawcon-recap-nyc-austin-4apc</guid>
      <description>&lt;p&gt;Over the past two weeks, Kilo presented two ClawCon events: ClawCon NYC on March 4th, and ClawCon Austin on March 12th. Combined, over 2,000 people showed up. Both venues hit capacity, and both made one thing pretty obvious: personal AI agents aren't an experiment anymore.&lt;/p&gt;

&lt;p&gt;People are building real things, running real businesses, and showing up in person to learn from each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is ClawCon?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2F%24s_%21eyB9%21%2Cw_1456%2Cc_limit%2Cf_auto%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252Fab11f9f0-c2f8-464c-9170-2d00d7e24ecc_2048x1355.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2F%24s_%21eyB9%21%2Cw_1456%2Cc_limit%2Cf_auto%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252Fab11f9f0-c2f8-464c-9170-2d00d7e24ecc_2048x1355.jpeg" title="No alternative text description for this image" alt="No alternative text description for this image" width="800" height="529"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you haven't been to one, ClawCon isn't a traditional tech conference. It grew out of the OpenClaw ecosystem, and the format is simple: doors open, people demo what they've built with their personal AI agents, there's Q&amp;amp;A, and then a lot of unstructured time to just talk to other people who are excited about this technology.&lt;/p&gt;

&lt;p&gt;The events are free, there's no LinkedIn or GitHub screening, and the only barrier to entry is showing up early enough to get through the door before capacity hits. At both NYC and Austin, that happened fast.&lt;/p&gt;

&lt;p&gt;OpenClaw has crossed 320,000 GitHub stars and is the fastest-growing open source AI agent in history. It connects to 50+ chat platforms, runs shell commands, controls browsers, manages files, and maintains memory across sessions. The community around it has grown just as fast, and ClawCon is where that community meets in person. (Kilo's role in all of this: we built KiloClaw, a managed hosting platform for OpenClaw agents that's natively connected to the Kilo Gateway and its 500+ models. We demoed it at both events. More on that below.)&lt;/p&gt;

&lt;h2&gt;
  
  
  ClawCon NYC: 1,300 People in Lower Manhattan
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2F%24s_%2184RU%21%2Cw_1456%2Cc_limit%2Cf_auto%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252F5b848a59-9273-46bc-839a-99ac3e0c487b_2048x1345.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2F%24s_%2184RU%21%2Cw_1456%2Cc_limit%2Cf_auto%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252F5b848a59-9273-46bc-839a-99ac3e0c487b_2048x1345.jpeg" title="No alternative text description for this image" alt="No alternative text description for this image" width="800" height="525"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ClawCon NYC took over Ideal Glass Studios in Manhattan on March 4th. 1,313 people RSVPed, the venue filled up well before demos started at 7pm, and the one-in, one-out policy kicked in shortly after doors opened at 6. The crowd was a mix: developers who've been running OpenClaw agents for months, people who had never set one up but wanted to learn, founders demoing tools they'd built on top of the ecosystem, and plenty of people who were just curious about what "personal AI" actually means in practice. That range is part of what makes these events work. It's not a room full of the same person.&lt;/p&gt;

&lt;p&gt;The demo portion covered a range of use cases. OpenClaw maintainers walked through recent platform updates, and several community members showed personal workflows they'd built: agents handling email triage, research pipelines, content generation, and task automation across multiple platforms.&lt;/p&gt;

&lt;p&gt;One recurring theme was people who had set up OpenClaw agents not just for themselves, but for other people and businesses. The post-event email from ClawCon organizer Michael Galpert specifically called this out, noting that many attendees came looking for help getting their own agent running, while others had already set up "multiple individuals and businesses with their own Claws." That dynamic --- people who build agents helping people who want agents --- is becoming its own economy within the community, and it's exactly the kind of gap that KiloClaw was built to close on the infrastructure side.&lt;/p&gt;

&lt;p&gt;The afterparty, sponsored by JellyJelly and Zo.Computer, kept things going. The ClawCon Telegram channel saw a massive influx of new members, the X livestream pulled significant viewership, and the general vibe in the post-event chatter was that the NYC OpenClaw community is bigger and more active than most people realized.&lt;/p&gt;

&lt;h2&gt;
  
  
  ClawCon Austin: 756 People, Robots, and Teenagers Running Businesses
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fponxvcjncadbcuwwsov7.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fponxvcjncadbcuwwsov7.jpeg" width="800" height="532"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Eight days later, ClawCon hit Austin at &lt;a href="https://www.antler.co/location/us" rel="noopener noreferrer"&gt;Antler VC's&lt;/a&gt; space on Brazos Street. 756 people attended with the same format, and the demo lineup showed just how fast this space is evolving.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Nat Eliason&lt;/strong&gt; walked through his personal agent "Felix Craft" and how he uses it day-to-day. Nat's audience skews toward creators and entrepreneurs rather than developers, so seeing how he integrates an AI agent into non-technical work was valuable context for a lot of people in the room.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Austen Allred&lt;/strong&gt; demoed his agent Kelly and what he calls her "Software Factory" --- a workflow where the agent handles significant portions of the development process autonomously.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Thanh Pham&lt;/strong&gt; presented five distinct OpenClaw use cases he's built for real, paying clients. This was one of the clearest signals of a maturing ecosystem: people aren't just building agents for fun, they're getting paid to build and configure them for others.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Matt Hartman&lt;/strong&gt; brought a physical robot that he controls with OpenClaw. A physical, moving robot operated by an AI agent. If you want a visceral demo of what agents can do beyond text on a screen, that'll do it.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And then there were the Alpha School kids. This was the moment that stuck with people.&lt;/p&gt;

&lt;p&gt;Branson, a 15-year-old student at Alpha School, has made $30,000 selling OpenClaw agent setup and configuration services. He's not dabbling --- he's running a business. His classmates demoed alongside him: Ananya built a Reddit bot that automatically finds and engages potential customers, Austin is building edtech products with OpenClaw, and Geetesh is starting a business with his agent.&lt;/p&gt;

&lt;p&gt;A 15-year-old generating $30K in revenue doing this says something important about where personal AI is headed: it's accessible enough for a high schooler to master, and valuable enough that people will pay real money for help. It also hints at where tools like KiloClaw fit in --- when you're setting up agents for other people, the last thing you want is to also be managing their infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  KiloClaw: What We Demoed
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/t2iTYbDsSds"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;We brought KiloClaw to the stage at both NYC and Austin, and it resonated for slightly different reasons at each event, but the core message was the same: getting OpenClaw running shouldn't be the hard part.&lt;/p&gt;

&lt;p&gt;Here's the context. OpenClaw is powerful, but self-hosting it is genuinely painful. You're looking at 30-60+ minutes of SSH, environment configuration, dependency management, and security hardening just to get it stood up. Once it's running, there's no auto-restart, no health monitoring, no alerting. If your agent crashes at 3am, you find out the next morning. And every new OpenClaw release means SSH-ing back in, pulling the latest code, hoping nothing breaks, and restarting manually.&lt;/p&gt;

&lt;p&gt;KiloClaw eliminates all of that. It's a fully managed hosting platform for OpenClaw agents, built on the same Kilo Gateway infrastructure that already serves 1.5M+ developers and routes to 500+ AI models. You go from zero to a running OpenClaw agent in under 60 seconds. No SSH, no Docker, no YAML files. One-click deploy, auto-restart on crash, automatic updates, and support for multiple chat platforms including Telegram, Discord, and Slack.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://kilo.ai/kiloclaw#pricing" rel="noopener noreferrer"&gt;Pricing&lt;/a&gt; for KiloClaw is &lt;strong&gt;$9/month&lt;/strong&gt; on a 6-month commit or &lt;strong&gt;$25/month&lt;/strong&gt; standard, and AI inference runs through Kilo Gateway at cost with zero markup --- same transparent pricing Kilo has always offered.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In NYC, the crowd skewed toward people who wanted an agent but didn't want to manage infrastructure, and the number of people who came up to the booth afterward asking "so I can just... have one running tonight?" confirmed the message landed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwgwqxeyy1lat8kq0c8v.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwgwqxeyy1lat8kq0c8v.jpeg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In Austin, the room had a higher concentration of people already building with or on top of OpenClaw, so for them the value was less about easy setup and more about the infrastructure layer underneath: access to 500+ models without juggling API keys across providers, unified billing through the same Kilo account they might already use for the IDE extension or CLI, scheduled tasks and cron jobs so agents can run while you sleep, and enterprise-grade security for when your agent is handling API keys, conversations, and connected accounts for clients.&lt;/p&gt;

&lt;p&gt;For someone like Thanh, who's building agent configurations for paying clients, or Branson, who's running a services business at 15, the pitch isn't convenience --- it's reliability. When you're charging someone for an agent, it can't crash silently at 3am with no monitoring and no auto-restart. KiloClaw handles that.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers After Austin
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft37mt42xxwiqhktiyd7v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft37mt42xxwiqhktiyd7v.png" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Austin livestream crossed 120,000 views on X. That same week, both ClawCon and &lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt;, the powered-by-Kilo AI model benchmark for OpenClaw created by our very own Brendan O'Leary, got a shoutout during NVIDIA's GTC keynote.&lt;/p&gt;

&lt;p&gt;That kind of visibility from one of the biggest names in AI hardware isn't something a community meetup series typically gets, and it says a lot about how quickly the personal AI agent movement has scaled.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We're Seeing Across Both Events
&lt;/h2&gt;

&lt;p&gt;A few patterns stood out across NYC and Austin:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-developers are showing up.&lt;/strong&gt; ClawCon isn't just attracting engineers. Creators, entrepreneurs, small business owners, and students are all in the room. The tools have gotten accessible enough that you don't need to be a developer to benefit from a personal AI agent, and while you might still need help setting one up, the gap between "interested" and "using one daily" is shrinking fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The community is self-organizing.&lt;/strong&gt; The Telegram group, the demo culture, the organic matchmaking between people who build agents and people who want them --- none of this is centrally orchestrated by a company. It's a community building its own infrastructure, and ClawCon is where it becomes visible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Personal AI is moving from novelty to utility.&lt;/strong&gt; The demos weren't "look at this cool thing." They were "here's how this saves me 10 hours a week" or "here's how this makes me money." The conversation has shifted from possibility to practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqwom942z2glnsbn9adg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqwom942z2glnsbn9adg.png" width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ClawCon Miami presented by Kilo Code is on March 25th, and you can sign up &lt;a href="https://luma.com/clawconmiami" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to get your own OpenClaw agent running without dealing with infrastructure, KiloClaw gets you from zero to deployed in under 60 seconds. Your Kilo Code account is your KiloClaw account --- no new signup or billing relationship. Just add an agent to your toolkit. &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw#pricing" rel="noopener noreferrer"&gt;Try KiloClaw →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want help getting your agent configured with your email, calendar, and key workflows, we also offer a live 1-hour onboarding call to get everything dialed in. &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/config-service?_gl=1*w2rxi3*_gcl_aw*R0NMLjE3NzI1OTgwNDkuQ2owS0NRaUFwLXpMQmhEa0FSSXNBQmNZYzZzR1c0WnpJdUxFTTN1akw4WEpieml6SG8xbi15ZEFvNjJoV2xxUGpRcDhPM2d2TW55LVNOQWFBcVg2RUFMd193Y0I.*_gcl_au*MTMxMTk3MzI3MC4xNzczMDg4NTY5LjkwNDMzMzA2OC4xNzczODAzMDA5LjE3NzM4MDMwMDg." rel="noopener noreferrer"&gt;Learn more →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And if you were at NYC or Austin --- thanks for being there. The energy at both events was something you had to feel in person. See you at the next one!&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>ai</category>
      <category>discuss</category>
    </item>
    <item>
      <title>MiniMax-M2.7 Is Now Available in Kilo. Here’s How It Performs.</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Thu, 19 Mar 2026 08:45:00 +0000</pubDate>
      <link>https://forem.com/kilocode/minimax-m27-is-now-available-in-kilo-heres-how-it-performs-4imo</link>
      <guid>https://forem.com/kilocode/minimax-m27-is-now-available-in-kilo-heres-how-it-performs-4imo</guid>
      <description>&lt;p&gt;MiniMax just released MiniMax-M2.7, their most capable model yet. It's available now in Kilo across the IDE extension, CLI, and Cloud Agents.&lt;/p&gt;

&lt;p&gt;We ran it through two benchmarks: &lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt;, our OpenClaw agent benchmark, and Kilo Bench, an 89-task evaluation that tests autonomous coding across everything from git operations to cryptanalysis to QEMU automation.&lt;/p&gt;

&lt;p&gt;Here's what we found.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; M2.7 scores 86.2% on PinchBench, placing 5th overall and within 1.2 points of Claude Opus 4.6. On Kilo Bench, it passes 47% of tasks with a distinct behavioral profile --- it may over-explore hard problems (which can lead to timeouts) but solves tasks that no other model can. It's a fast and affordable model that fills some gaps that frontier models miss.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;PinchBench: #5 Out of 50 Models&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;PinchBench runs standardized OpenClaw agent tasks and grades them via automated checks and an LLM judge. M2.7 scored 86.2%, landing just behind GLM-5 and GPT-5.4 (both 86.4%) and just ahead of Qwen3.5-plus (85.8%).&lt;/p&gt;

&lt;p&gt;The top of the leaderboard as of today, March 18, 2025:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxwel2t9q9k7a4w6218xy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxwel2t9q9k7a4w6218xy.png" width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The gap between 1st and 9th place is less than 2 points. M2.7 is competitive with the best models available at the agentic coding task level that PinchBench measures.&lt;/p&gt;

&lt;p&gt;What's notable is the &lt;strong&gt;jump from M2.5 (82.5%) to M2.7 (86.2%) &lt;/strong&gt;--- a 3.7-point improvement that moved MiniMax from the middle of the pack into the top tier.&lt;/p&gt;

&lt;p&gt;You can try MiniMax M2.7 today in a cloud-hosted, fully-managed OpenClaw instance using KiloClaw. Set up takes 2 clicks and about 60 seconds, and you can access it from the Kilo dashboard:&lt;/p&gt;

&lt;p&gt;-&amp;gt; &lt;a href="https://app.kilo.ai/claw" rel="noopener noreferrer"&gt;https://app.kilo.ai/claw&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Kilo Bench: 89 Tasks vs 5 Other Models&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;PinchBench tells you how a model performs on typical agent tasks in OpenClaw. Kilo Bench tells you where it breaks.&lt;/p&gt;

&lt;p&gt;We ran M2.7 through 89 tasks in Kilo CLI alongside four other models: Qwen3.5-plus, GLM-5, Kimi K2.5, and Qwen3.5-397b. Each model received a single prompt and ran autonomously --- no human intervention, no course correction --- for up to an hour per task depending on complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Overall Results&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F49re2m61069ag1qi7uq7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F49re2m61069ag1qi7uq7.png" width="800" height="469"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;M2.7 came in second overall at 47%, two points behind Qwen3.5-plus. But the raw pass rate doesn't tell the full story.&lt;/p&gt;

&lt;p&gt;One pattern stood out: MiniMax-M2.7 reads extensively before writing. It pulls in surrounding files, analyzes dependencies, traces call chains. On tasks where that extra context pays off, it catches things other models miss. On tasks where the clock is ticking, that might cause it to run out of time.&lt;/p&gt;

&lt;p&gt;This is the same exploration-heavy behavior we saw in our&lt;a href="https://blog.kilo.ai/p/we-analyzed-how-much-kilo-code-reviewer" rel="noopener noreferrer"&gt; Code Reviewer cost analysis&lt;/a&gt; with Claude Opus 4.6, which consumed 1.18M input tokens on a single PR while Kimi K2.5 used 219K on the same diff. Deep reading finds deeper bugs --- but it costs time and can consume more tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Where M2.7 Stands Out&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The most interesting finding from Kilo Bench isn't the pass rate. It's what each model uniquely solves.&lt;/p&gt;

&lt;p&gt;Every model in this comparison solved tasks that no other model could:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5boya41s21hwsgszdfux.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5boya41s21hwsgszdfux.png" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;M2.7's unique win on the SPARQL task is a good example of its strength: the task required understanding that an EU-country filter was an eligibility criterion, not an output filter. That's a reasoning distinction, not a coding one.&lt;/p&gt;

&lt;p&gt;A hypothetical oracle that picks the best model per task would solve 60 out of 89 tasks (67%) --- a 36% improvement over the best single model. &lt;strong&gt;These models aren't interchangeable. They're complementary.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Task Difficulty Breakdown&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The 89 tasks split into clear tiers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;18 tasks all 5 models solved&lt;/strong&gt; --- git operations, text processing, basic ML, infrastructure setup. These are table stakes for any capable coding model in 2026.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;17 tasks where 2-3 models succeeded&lt;/strong&gt; --- this is where model selection actually matters. Tasks like differential cryptanalysis, Cython builds, and inference scheduling separate models by their behavioral tendencies, not just their raw capability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;29 tasks no model solved&lt;/strong&gt; --- circuit synthesis, MIPS emulation, pixel-perfect rendering, competitive CoreWars. These represent the current hard ceiling for LLM-based agents regardless of which model you pick.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Token Efficiency&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;M2.7 consumed roughly 2.8M input tokens per trial on average, which is on the higher end of any model tested. For context:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc924odo38pn7uiu7zams.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc924odo38pn7uiu7zams.png" width="800" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;M2.7 reads more context per step than any other model, which means it accumulates more tokens over the course of a task. When it works, that thoroughness produces correct solutions. When it doesn't, it means more tokens and time spent on each task.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What MiniMax-M2.7 Does Differently&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;MiniMax's announcement describes M2.7 as their first model that "deeply participates in its own evolution" --- the model was involved in updating its own memory, building training skills, and improving its own learning process during development. They report it autonomously ran over 100 rounds of scaffold optimization, achieving a 30% performance improvement on internal evals.&lt;/p&gt;

&lt;p&gt;That's a genuinely novel training approach and worth reading about in&lt;a href="https://www.minimax.io/news/MiniMax-M2-7-Early-Echoes-of-Self-Evolution" rel="noopener noreferrer"&gt; MiniMax's full announcement&lt;/a&gt;. Whether the self-evolution process contributes to M2.7's exploration-heavy behavior in our benchmarks is an interesting question we can't answer from the outside.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;When to Use M2.7&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Based on both benchmarks, here's how M2.7 fits into the model landscape available in Kilo:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;M2.7 is a strong pick when&lt;/strong&gt; you're working on tasks that reward deep context gathering --- complex refactors, codebase-wide changes, or anything where understanding surrounding code matters more than speed. Its PinchBench score puts it in the same tier as GPT-5.4 and GLM-5 for general agent tasks. Compared to frontier models like Opus 4.6 and GPT 5.4 that offer the same attributes, it's much less expensive at $0.30/M input and $1.20/M output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Consider a different model (even such as M2.1 or M2.5) &lt;/strong&gt;when you need very fast iteration cycles or are working on well-scoped, time-sensitive tasks. M2.7's median task duration (355s) is notably longer than its predecessors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The best approach&lt;/strong&gt; is what Kilo is built for: switch models based on the task. Use M2.7 for the work that benefits from a compromise of thorough analysis and speed. Use something lighter for the tasks that need quick turnaround. With 500+ models available, you're not locked into any single tradeoff.&lt;/p&gt;

&lt;p&gt;MiniMax-M2.7 is available now in Kilo. Try it in the IDE extension, CLI, or Cloud Agents.&lt;/p&gt;

</description>
      <category>minimax</category>
      <category>ai</category>
      <category>coding</category>
      <category>discuss</category>
    </item>
    <item>
      <title>7 Automations You Can Set and Forget Right Now</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Tue, 17 Mar 2026 14:04:51 +0000</pubDate>
      <link>https://forem.com/kilocode/7-automations-you-can-set-and-forget-right-now-52e0</link>
      <guid>https://forem.com/kilocode/7-automations-you-can-set-and-forget-right-now-52e0</guid>
      <description>&lt;p&gt;&lt;a href="https://app.kilocode.ai/cloud" rel="noopener noreferrer"&gt;Cloud Agents&lt;/a&gt; with Webhook Triggers turn Kilo into an event-driven automation layer for your development workflow.&lt;/p&gt;

&lt;p&gt;You push a tag. A deploy finishes. Someone labels an issue. Kilo picks it up, spins up a Cloud Agent, and starts working. No manual trigger, no context-switching, and workflows that happen while you sleep.&lt;/p&gt;

&lt;p&gt;We've already covered some big ones in previous posts (see &lt;a href="https://blog.kilo.ai/p/cloud-agents-the-missing-layer-in" rel="noopener noreferrer"&gt;here&lt;/a&gt; and &lt;a href="https://blog.kilo.ai/p/cloud-agents-webhooks" rel="noopener noreferrer"&gt;here&lt;/a&gt;): incident triage, security patching, dependency upgrades, documentation sync, policy enforcement.&lt;/p&gt;

&lt;p&gt;But webhooks can handle a lot more than the obvious plays. Here are seven more automations you should steal.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Nightly Code Quality Sweeps
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fermbplucybnmgu7vcfjd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fermbplucybnmgu7vcfjd.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every codebase accumulates lint violations, inconsistent formatting, unused imports, and dead code paths that nobody prioritizes because they're not the most urgent. They just make everything slightly worse over time.&lt;/p&gt;

&lt;p&gt;Set up a cron job (GitHub Actions scheduled workflow, a simple cron server, or any scheduler that can fire an HTTP POST) to trigger a webhook on a nightly or weekly cadence. The payload can specify which checks to run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"code-quality-sweep"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"checks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"lint-fix"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"unused-imports"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dead-code"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"target_dirs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"src/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lib/"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"base_branch"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"main"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cloud Agent Prompt Template:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A scheduled code quality sweep has been triggered:

{{bodyJson}}

Run the following cleanup tasks on the directories specified:

1. Run the project's configured linter with auto-fix enabled
2. Run the project's formatter (Prettier, Black, gofmt, etc.)
3. Identify and remove unused imports
4. Search for dead code: unexported functions with zero call sites,
   unreachable branches, commented-out blocks older than 30 days
5. Run the test suite to confirm nothing breaks
6. Commit each category of fix separately:
   - "style: auto-fix lint violations"
   - "style: format code"
   - "refactor: remove unused imports"
   - "refactor: remove dead code"

If any fix causes test failures, revert that specific change and
document it in `quality-sweep-notes.md`.

Open a pull request summarizing all changes with counts per category.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the kind of thing that never warrants sprint priority but compounds over months. The agent handles it overnight, and you review a clean PR in the morning. If your team runs a monorepo, scope &lt;code&gt;target_dirs&lt;/code&gt; to specific packages and rotate through them on different nights.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Feature Request to Prototype Branch
&lt;/h2&gt;

&lt;p&gt;When someone files a well-scoped feature request, the path from "that's a good idea" to "someone started working on it" can take days of backlog grooming and sprint planning. For straightforward requests, an agent can at least get an MVP in place automatically.&lt;/p&gt;

&lt;p&gt;Wire up a GitHub webhook that fires when an issue is labeled (e.g., &lt;code&gt;auto-prototype&lt;/code&gt; or &lt;code&gt;agent-implement&lt;/code&gt;). The payload includes the full issue body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"labeled"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auto-prototype"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"issue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"number"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;342&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Add CSV export to the analytics dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Users should be able to export the current dashboard view as a CSV file. The export button should appear in the top-right toolbar. Should include all visible columns with current filter state applied."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"login"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"contributor-username"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"repository"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"full_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"org/analytics-dashboard"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cloud Agent Prompt Template:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A feature request has been labeled for automatic prototyping:

{{bodyJson}}

Analyze this feature request and build a prototype implementation:

1. Read the existing codebase to understand architecture, patterns,
   and conventions
2. Create `prototype-plan.md` outlining your approach, files to modify,
   and any assumptions you're making
3. Implement the feature following existing project patterns
4. Add basic tests covering the happy path
5. If the request is ambiguous or requires design decisions, document
   your choices in `prototype-plan.md` and pick the simplest option
6. Commit incrementally as you work:
   - "feat: scaffold [feature] structure"
   - "feat: implement [feature] core logic"
   - "test: add tests for [feature]"

Do not modify CI configuration, deployment configs, or unrelated code.
Keep scope tight to what the issue describes.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't about shipping features without review. It's about eliminating the gap between "approved idea" and "first draft." Your team reviews the prototype PR, iterates on it, or uses it as a reference for a manual implementation. Either way, the starting line moved forward without anyone context-switching.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; Be selective about which issues get the label. This works best for well-specified, moderate-complexity requests. Vague issues like "make the app faster" won't produce useful output.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Stale Branch Audit and Cleanup
&lt;/h2&gt;

&lt;p&gt;Many repos accumulate dozens of branches that nobody remembers. Feature branches from three months ago, experiment branches that went nowhere, hotfix branches that were merged but never deleted. They clutter your branch list and occasionally cause confusion about what's active.&lt;/p&gt;

&lt;p&gt;Trigger this on a weekly or biweekly cron:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"branch-audit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"stale_threshold_days"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"protected_branches"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"main"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"develop"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"staging"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"release/*"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dry_run"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cloud Agent Prompt Template:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A scheduled branch audit has been triggered:

{{bodyJson}}

Audit the repository's remote branches:

1. Use `git branch -r` and `git log` to list all remote branches
   with their last commit date and author
2. Identify branches with no commits in the last {{body.stale_threshold_days}} days
3. Skip any branches matching the protected patterns
4. For each stale branch, check if it was merged into main
   (use `git branch -r --merged origin/main`)
5. Generate `branch-audit-report.md` containing:
   - Total branch count
   - Stale branches (merged vs unmerged), with last commit date and author
   - Recommended actions for each
6. If dry_run is false AND the branch was already merged:
   delete the remote branch using `git push origin --delete &amp;lt;branch&amp;gt;`
7. Commit the audit report:
   "chore: branch audit report - [date]"

Never delete unmerged branches automatically. Flag them in the report
for human review.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is pure housekeeping that nobody wants to do manually. The audit report gives you visibility into what's lingering, and the auto-deletion of merged branches keeps things clean without risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Release Prep Automation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3dkfxjz63ei78dvme6gu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3dkfxjz63ei78dvme6gu.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Release days involve a predictable checklist: bump the version, update the changelog, check that migration guides are current, tag the commit, maybe update some environment configs. It's mechanical work that's easy to mess up when you're rushing.&lt;/p&gt;

&lt;p&gt;Trigger this when you create a GitHub release or push a tag matching your release pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"published"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"release"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tag_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"v2.4.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"v2.4.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"## What's New&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- CSV export for analytics dashboard&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Improved error handling in payment flow&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Bug fix: session timeout on mobile"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prerelease"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"target_commitish"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"main"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"repository"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"full_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"org/product-api"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cloud Agent Prompt Template:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A new release has been published:

{{bodyJson}}

Prepare the repository for this release:

1. Update version numbers in all relevant files (package.json,
   pyproject.toml, version.go, etc.) to match the tag
2. Generate a CHANGELOG entry for this version:
   - Use `git log` to collect all commits since the previous tag
   - Group by type (feat, fix, refactor, docs, chore)
   - Include PR numbers where available (parse from commit messages
     or use `gh pr list --state merged`)
3. Check that README version badges reference the new version
4. If a MIGRATION.md or UPGRADING.md exists, verify it covers any
   breaking changes found in the commit log
5. If breaking changes exist but aren't documented, create a section
   in the migration guide with the relevant commit details
6. Commit: "chore(release): prepare v{{body.release.tag_name}}"

Do not modify application logic or tests.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent handles the tedious release bookkeeping so your release process is consistent every time. No more forgetting to update the changelog or missing a version reference buried in a config file.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Scheduled Test Coverage Gap Analysis
&lt;/h2&gt;

&lt;p&gt;Test coverage reports tell you a number. They don't tell you which gaps actually matter or write the tests to fill them. An agent can do both.&lt;/p&gt;

&lt;p&gt;Run this on a weekly cron, or trigger it after a milestone is closed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"coverage-gap-analysis"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"coverage_threshold"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"focus_dirs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"src/api/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/services/"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"skip_patterns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"*.test.*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*.spec.*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"__mocks__/"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cloud Agent Prompt Template:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A test coverage analysis has been triggered:

{{bodyJson}}

Analyze and improve test coverage:

1. Run the project's test suite with coverage reporting enabled
2. Parse the coverage report to identify files below
   {{body.coverage_threshold}}% coverage
3. Filter to files in the focus directories, excluding skip patterns
4. For the 5 files with the lowest coverage:
   - Analyze what's untested (uncovered branches, functions, edge cases)
   - Write tests that cover the most critical untested paths
   - Prioritize: error handling &amp;gt; core business logic &amp;gt; utility functions
5. Run the test suite again to verify new tests pass
   and coverage improved
6. Generate `coverage-report.md` with:
   - Before/after coverage percentages per file
   - Summary of what was tested and why those paths were prioritized
7. Commit: "test: improve coverage for [module/area]"

Write tests that follow existing test patterns and conventions
in the project. Do not refactor source code to improve testability.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scoping to the 5 worst files per run keeps PRs reviewable. Over a few weeks of scheduled runs, coverage steadily improves without anyone grinding through it manually.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note on runtime:&lt;/strong&gt; If your full test suite takes more than 10-12 minutes, configure the agent to run a targeted subset (e.g., only tests in the focus directories). Each Cloud Agent message has a 15-minute execution window.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. First-Time Contributor Support
&lt;/h2&gt;

&lt;p&gt;Open source projects lose contributors at the first PR. The experience is often: submit a PR, wait days for review, get a list of style violations and missing tests, feel overwhelmed, disappear. An agent can smooth that onboarding curve significantly.&lt;/p&gt;

&lt;p&gt;Wire up a GitHub webhook that fires on &lt;code&gt;pull_request.opened&lt;/code&gt;. Filter in your GitHub webhook settings (or in the prompt) for first-time contributors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"opened"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pull_request"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"number"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;187&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Add dark mode toggle to settings page"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Implements dark mode toggle as described in #142."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"login"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"new-contributor"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"head"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"ref"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"feature/dark-mode-toggle"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"author_association"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"FIRST_TIME_CONTRIBUTOR"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"repository"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"full_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"org/open-source-project"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cloud Agent Prompt Template:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A pull request has been opened by a first-time contributor:

{{bodyJson}}

If author_association is not "FIRST_TIME_CONTRIBUTOR", stop here.
No action needed for returning contributors.

Help this contributor get their PR ready for review:

1. Check out their branch and review the changes
2. Check if tests exist for the new/modified code. If not, write
   tests following the project's existing test patterns and push
   them to the contributor's branch
3. Run the linter and formatter. If there are violations, fix them
   and push a commit: "style: fix lint/format issues"
4. Check if the PR description references an issue. If the linked
   issue has acceptance criteria, verify the implementation covers them
5. If CONTRIBUTING.md exists, check the PR against its requirements
   (commit message format, branch naming, etc.) and fix what you can
6. Create a welcoming comment on the PR (using `gh pr comment`)
   summarizing what you did:
   - Tests added or adjusted
   - Style fixes applied
   - Any remaining items the contributor should address

Be encouraging. This may be their first open source contribution.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent doesn't replace human code review. It handles the mechanical stuff (linting, test scaffolding, format fixes) so that when a maintainer does review, the conversation is about the actual implementation rather than style violations. For the contributor, the experience goes from "wall of automated check failures" to "an agent cleaned up the small stuff, here's what's left."&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Post-Deploy Smoke Test + Rollback Alert
&lt;/h2&gt;

&lt;p&gt;You've deployed. CI passed. But does the thing actually work in production? Smoke tests catch the gaps between "tests pass in CI" and "the app works for real users."&lt;/p&gt;

&lt;p&gt;Trigger a webhook from your deployment pipeline (GitHub Actions, ArgoCD, a deploy script) after a successful deploy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deploy_completed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"environment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"production"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"api-gateway"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"deploy_commit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a1b2c3d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"deploy_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://api.yourproduct.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"health_endpoint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/health"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"critical_endpoints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/api/v1/status"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/api/v1/config"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"previous_commit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"e4f5g6h"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"rollback_branch"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"release/2.3.9"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cloud Agent Prompt Template:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A deployment has completed:

{{bodyJson}}

Run post-deploy verification:

1. Use curl to hit the health endpoint and verify a 200 response:
   `curl -s -o /dev/null -w "%{http_code}" {{body.deploy_url}}{{body.health_endpoint}}`
2. For each critical endpoint, make a request and verify:
   - Response status is 2xx
   - Response time is under 2 seconds
   - Response body is valid JSON (if applicable)
3. Check the deploy commit's diff against the previous commit to
   identify which files changed
4. If any endpoint fails:
   - Document the failure details in `deploy-smoke-report.md`
   - Include the failing endpoint, status code, response body, and
     which files in the deploy diff are most likely related
   - Use `gh issue create` to open a P1 issue with the failure details
5. If all endpoints pass, create `deploy-smoke-report.md` confirming
   the deploy is healthy with response times for each endpoint
6. Commit: "chore: post-deploy smoke test report for {{body.deploy_commit}}"

This is verification only. Do not modify application code.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent runs basic health checks immediately after deploy and, if something's wrong, opens an issue with the failure context and the relevant diff. It's not a replacement for a full monitoring stack, but it catches the "deploy broke something obvious" cases within minutes instead of waiting for user reports.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Your Agent Environment Profile needs network access to hit those endpoints. If your production environment is behind a VPN or firewall, the Cloud Agent container won't be able to reach it. This works best for publicly accessible APIs or services with external health endpoints.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wiring It All Up
&lt;/h2&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/SI7ybU5so20"&gt;
  &lt;/iframe&gt;


 &lt;/p&gt;

&lt;p&gt;Every automation above follows the same setup flow in the Kilo Dashboard at &lt;a href="https://app.kilo.ai/cloud/webhooks" rel="noopener noreferrer"&gt;app.kilo.ai/cloud/webhooks&lt;/a&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create an Agent Environment Profile&lt;/strong&gt; with the env vars, secrets, and startup commands your automation needs. Install any tools not in the base image via startup commands. Profiles are reusable across triggers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Configure a Webhook Trigger&lt;/strong&gt; with your prompt template and target repo. The trigger resolves the profile at runtime, so profile updates automatically apply to future executions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Copy the webhook URL&lt;/strong&gt; and configure your external system to POST to it. GitHub webhook settings for repo events, a cron job for scheduled tasks, your deploy pipeline for post-deploy flows.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For personal accounts, webhook sessions run in your Cloud Agent container and you can watch them execute live. Organization webhooks run in dedicated compute as a bot user, with completed sessions available to share or fork.&lt;/p&gt;

&lt;p&gt;If you're building automations with Cloud Agents, share what you're running in the &lt;code&gt;#cloud-agents&lt;/code&gt; channel on &lt;a href="https://kilo.ai/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>cloud</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
