<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: jidonglab</title>
    <description>The latest articles on Forem by jidonglab (@ji_ai).</description>
    <link>https://forem.com/ji_ai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png</url>
      <title>Forem: jidonglab</title>
      <link>https://forem.com/ji_ai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ji_ai"/>
    <language>en</language>
    <item>
      <title>71,700 Stars and 60 Rust Crates: Inside OpenAI's Codex CLI Source</title>
      <dc:creator>jidonglab</dc:creator>
      <pubDate>Sun, 03 May 2026 17:54:42 +0000</pubDate>
      <link>https://forem.com/ji_ai/71700-stars-and-60-rust-crates-inside-openais-codex-cli-source-363i</link>
      <guid>https://forem.com/ji_ai/71700-stars-and-60-rust-crates-inside-openais-codex-cli-source-363i</guid>
      <description>&lt;p&gt;71,700 stars. 5,006 commits. 665 releases. 94.7% Rust. When &lt;a href="https://openai.com/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt; dropped the full source code of &lt;a href="https://github.com/openai/codex" rel="noopener noreferrer"&gt;Codex CLI&lt;/a&gt; under the &lt;a href="https://www.apache.org/licenses/LICENSE-2.0" rel="noopener noreferrer"&gt;Apache 2.0 license&lt;/a&gt;, I expected a thin wrapper around an API. What I found was a deeply engineered system with over 60 Rust crates, OS-level sandboxing on three platforms, and an agent loop architecture that reveals how OpenAI thinks about local AI tooling.&lt;/p&gt;

&lt;p&gt;I spent a weekend reading through the repository. This is what I found.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Opened the Source
&lt;/h2&gt;

&lt;p&gt;I've been building with AI coding tools for over a year. I wrote about bootstrapping a full pipeline with GPT-5 Codex in a single day, and I've documented my experience running parallel subagents with Claude Code. But every tool I've used has been a black box at some level. The model is remote. The agent logic is proprietary. The sandbox rules are opaque.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Related: &lt;a href="https://jidonglab.com/blog/gpt5-codex-5800-line-bootstrap" rel="noopener noreferrer"&gt;5,800 Lines in One Day: Bootstrapping a Full Pipeline With gpt-5-codex&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When Codex CLI went open-source, the appeal was obvious: for the first time, I could read the exact code that decides what an AI agent can and cannot do on my machine. Not a documentation page. Not a blog summary. The actual Rust source.&lt;/p&gt;

&lt;p&gt;The repository lives at &lt;code&gt;github.com/openai/codex&lt;/code&gt;. The core logic sits under &lt;code&gt;codex-rs/&lt;/code&gt;, with four top-level crates that divide responsibility cleanly. There's &lt;code&gt;core&lt;/code&gt; for the agent loop and tool execution. There's &lt;code&gt;cli&lt;/code&gt; for the terminal interface. There's &lt;code&gt;tui&lt;/code&gt; for the text-based graphical UI. And there's &lt;code&gt;headless&lt;/code&gt;, which speaks JSON-RPC over stdio so that VS Code extensions and web applications can connect to the same engine without any GUI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Loop, Unwrapped
&lt;/h2&gt;

&lt;p&gt;The architecture follows a pattern that's becoming standard in AI coding tools, but the implementation details matter. When you type a natural language command into Codex CLI, the &lt;code&gt;core&lt;/code&gt; crate takes over. It constructs an HTTP request to the &lt;a href="https://platform.openai.com/docs/api-reference" rel="noopener noreferrer"&gt;OpenAI Responses API&lt;/a&gt; and opens a streaming connection. Events arrive one by one. Some are text tokens. Some are tool calls.&lt;/p&gt;

&lt;p&gt;Here's the flow, stripped to its essence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User prompt
     |
     v
+------------------+
|  core: agent     |
|  loop            |
+------------------+
     |
     v
OpenAI Responses API
(streaming HTTP)
     |
     v
+------------------+
|  Event parser    |
+------------------+
     |
     +---&amp;gt; Text token --&amp;gt; stream to terminal
     |
     +---&amp;gt; Tool call --&amp;gt; execute locally
                |
                v
         Return result to API
                |
                v
         (loop continues)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tool system is defined through a &lt;code&gt;ToolSpec&lt;/code&gt; enum. Each tool declares its input and output via &lt;a href="https://json-schema.org/" rel="noopener noreferrer"&gt;JSON Schema&lt;/a&gt;, which means the model knows exactly what parameters a tool accepts and what shape the response will take. This is the same pattern that Claude Code and other agent frameworks use, but seeing it implemented in Rust with full type safety gives it a different character. There's no runtime type coercion. No &lt;code&gt;any&lt;/code&gt; escape hatch. Every tool call is validated at compile time against its schema.&lt;/p&gt;

&lt;p&gt;What makes this extensible is &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; integration. MCP, the Model Context Protocol, allows external servers to register tools dynamically. If you have a custom database tool or a deployment script that you want the agent to use, you spin up an MCP server and Codex CLI discovers it at runtime. The agent treats MCP tools identically to built-in ones. Same schema validation. Same sandbox restrictions.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Related: &lt;a href="https://jidonglab.com/blog/claude-code-config-guide" rel="noopener noreferrer"&gt;Claude Code Config Guide&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The system prompt lives in &lt;code&gt;codex-rs/core/prompt.md&lt;/code&gt;, a plain Markdown file that anyone can read and modify. Configuration sits in &lt;code&gt;~/.codex/config.toml&lt;/code&gt;. Session state persists to a local &lt;a href="https://sqlite.org/" rel="noopener noreferrer"&gt;SQLite&lt;/a&gt; database. There's no cloud state. Everything lives on your machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sandboxing: Three Operating Systems, Three Strategies
&lt;/h2&gt;

&lt;p&gt;The security model is where the engineering gets serious. Codex CLI doesn't just run commands in a subprocess and hope for the best. It wraps every tool execution in an OS-native sandbox.&lt;/p&gt;

&lt;p&gt;On macOS, it uses &lt;a href="https://developer.apple.com/documentation/security/app-sandbox" rel="noopener noreferrer"&gt;Seatbelt&lt;/a&gt;, Apple's application sandbox framework. The same technology that restricts what App Store applications can access on your Mac is applied to every command the AI agent runs. File system access is limited to the project directory. Network connections can be blocked or scoped. Process creation is controlled.&lt;/p&gt;

&lt;p&gt;On Linux, the approach combines &lt;a href="https://github.com/containers/bubblewrap" rel="noopener noreferrer"&gt;Bubblewrap&lt;/a&gt; for filesystem and namespace isolation with &lt;a href="https://man7.org/linux/man-pages/man2/seccomp.2.html" rel="noopener noreferrer"&gt;Seccomp&lt;/a&gt; for system call filtering. Bubblewrap creates a lightweight container-like environment. Seccomp sits below that, blocking dangerous syscalls entirely. The agent literally cannot call &lt;code&gt;execve&lt;/code&gt; on an arbitrary binary outside its allowed list.&lt;/p&gt;

&lt;p&gt;On Windows, Restricted Tokens limit the process's access rights. It's the least granular of the three approaches, but it still prevents the agent from accessing files or registry keys outside its scope.&lt;/p&gt;

&lt;p&gt;This is fundamentally different from how &lt;a href="https://code.claude.com/" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; handles security. Claude Code runs its model inference on Anthropic's cloud. The local client, written in TypeScript, executes tools on your machine but relies on a permission-based model rather than OS-level sandboxing. You approve or deny each action. With Codex CLI, the sandbox enforces restrictions regardless of what the model requests. The model can ask to read &lt;code&gt;/etc/passwd&lt;/code&gt; all day long. Seatbelt will say no.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Related: &lt;a href="https://jidonglab.com/blog/2026-03-27-claude-code-subagents-parallel-guide" rel="noopener noreferrer"&gt;Claude Code Subagents Parallel Guide&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Performance: Caching and the Model Question
&lt;/h2&gt;

&lt;p&gt;Codex CLI is built around GPT-5.3-Codex, a model variant optimized for code generation and tool use. The performance optimization strategy centers on prompt caching. System prompts, project context, and frequently repeated instructions are cached on the API side, reducing both latency and cost on subsequent turns within a conversation.&lt;/p&gt;

&lt;p&gt;The Rust implementation itself contributes to performance in ways that a TypeScript or Python agent cannot match. There's no garbage collector pause. Memory allocation is deterministic. The binary ships as a single executable with no runtime dependencies. On my machine, cold start to first API call takes under 200 milliseconds. Compare that to Node.js-based tools where the module resolution alone can take longer.&lt;/p&gt;

&lt;p&gt;The headless mode deserves attention here. By communicating over JSON-RPC through stdio using JSONL (newline-delimited JSON), Codex CLI achieves a clean separation between the engine and any frontend. The same binary that powers the terminal CLI also powers the VS Code extension and could power a web application. There's no separate server process. No WebSocket setup. Just stdin and stdout, the most universal IPC mechanism in computing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Open Source Changes
&lt;/h2&gt;

&lt;p&gt;Reading Codex CLI's source changed how I think about AI coding tools. When I use Claude Code, I trust Anthropic's security claims. When I use Codex CLI, I can verify them. I can grep for &lt;code&gt;seatbelt&lt;/code&gt; in the codebase and read the exact sandbox profile applied to my commands. I can open &lt;code&gt;prompt.md&lt;/code&gt; and see what the model is told about my project before it generates a single token.&lt;/p&gt;

&lt;p&gt;This transparency has practical implications. Enterprise security teams can audit the sandbox policies before approving the tool. Contributors can fix bugs in the agent loop without waiting for a vendor patch. Researchers can study a production-grade AI agent architecture without reverse engineering.&lt;/p&gt;

&lt;p&gt;The choice of Rust is itself a statement. Building an AI tool's core in a systems language that guarantees memory safety, compiles to native code, and supports cross-platform builds from a single codebase signals long-term investment in performance and reliability. Python would have been easier. TypeScript would have matched the web ecosystem. Rust says: this tool will run on your machine, close to your code, and it will not crash.&lt;/p&gt;

&lt;p&gt;The competitive landscape is shifting. Claude Code dominates in developer experience and model capability. &lt;a href="https://cursor.sh/" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt; owns the IDE integration space. &lt;a href="https://github.com/features/copilot" rel="noopener noreferrer"&gt;GitHub Copilot&lt;/a&gt; has distribution through GitHub's user base. Codex CLI's bet is that full transparency and community ownership will attract developers who care about understanding and controlling the tools they depend on.&lt;/p&gt;

&lt;p&gt;Whether that bet pays off depends on the community. The code is there. The architecture is solid. The question is whether 71,700 stars translate into contributors who push the tool beyond what OpenAI alone would build.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The best way to trust an AI tool is to read its source code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What would you build if you could fork the entire agent loop?&lt;/p&gt;




&lt;p&gt;Full Korean analysis on &lt;a href="https://spoonai.me/posts/openai-codex-open-source-analysis" rel="noopener noreferrer"&gt;spoonai.me&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Pentagon Blacklisted Anthropic From 8 Classified AI Deals</title>
      <dc:creator>jidonglab</dc:creator>
      <pubDate>Sun, 03 May 2026 15:53:14 +0000</pubDate>
      <link>https://forem.com/ji_ai/pentagon-blacklisted-anthropic-from-8-classified-ai-deals-4kcn</link>
      <guid>https://forem.com/ji_ai/pentagon-blacklisted-anthropic-from-8-classified-ai-deals-4kcn</guid>
      <description>&lt;p&gt;The most safety-focused AI lab on the planet just got blacklisted by the Pentagon for being too safe. On May 1, the U.S. Department of Defense signed IL6/IL7 classified AI deals with eight companies. Anthropic — the one that publishes its safety policy — was not on the list.&lt;/p&gt;

&lt;p&gt;IL6 and IL7 are the Pentagon's classified-data tiers: IL6 covers Secret-level data (operational plans, weapons specs), IL7 covers Top Secret and Special Access Programs. They are the highest commercially-accessible authorization levels in U.S. government cloud, and getting cleared costs years and tens of millions per vendor.&lt;/p&gt;

&lt;p&gt;I have been tracking this since the "supply chain risk" label leaked in February, and the May 1 announcement closes the loop in a way that should change how you think about building on frontier AI. The reason Anthropic is locked out is not technical. It is a single contract clause — and Anthropic filed a federal lawsuit rather than sign it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The eight that signed, and what they shared
&lt;/h2&gt;

&lt;p&gt;The Pentagon's roster reads like a who's-who of vendors who already accepted the rules of engagement. The big three cloud providers were locks because they already operate IL6 environments for the rest of the federal government. Google's inclusion is the headline of full rehabilitation: a company that walked out of military AI in 2018 after the Project Maven revolt is back inside the classified perimeter.&lt;/p&gt;

&lt;p&gt;OpenAI's presence is the most striking shift. As recently as January 2024 its usage policy explicitly forbade military applications. The company quietly removed that restriction, accepted the Pentagon's "all lawful purposes" language, and two years later it is sitting at the classified AI table. That made OpenAI the template every other vendor was measured against.&lt;/p&gt;

&lt;p&gt;Here is the half of the roster with the deepest defense roots:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Company&lt;/th&gt;
&lt;th&gt;Defense role&lt;/th&gt;
&lt;th&gt;Why they were a lock&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Microsoft&lt;/td&gt;
&lt;td&gt;Azure Government Secret/Top Secret + AI&lt;/td&gt;
&lt;td&gt;JWCC holder, existing IL6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS&lt;/td&gt;
&lt;td&gt;GovCloud IL6 + Bedrock AI&lt;/td&gt;
&lt;td&gt;Operates the CIA's C2E cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Oracle&lt;/td&gt;
&lt;td&gt;OCI Government cloud + database&lt;/td&gt;
&lt;td&gt;Long-running DoD ERP operator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nvidia&lt;/td&gt;
&lt;td&gt;Training/inference GPU + software&lt;/td&gt;
&lt;td&gt;De facto DoD AI infra standard&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The other four describe a strategic bet rather than a compliance default. Google brings Gemini back into classified rotation. OpenAI brings GPT deployment behind the SCIF wall. SpaceX brings Starshield satellite comms and edge inference for forward-deployed networks. And Reflection AI — a 2025-founded startup negotiating a $25B valuation, with no public model and no shipped product — got a classified contract anyway.&lt;/p&gt;

&lt;p&gt;The contrast with Anthropic is brutal. Claude is one of the most capable frontier models in the world. It was excluded. A company with no model was included. The selection criteria are not about technical capability. They are about willingness to sign whatever the government puts in front of you.&lt;/p&gt;


&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/ji_ai/claude-opus-47-hit-876-on-swe-bench-the-story-is-what-it-didnt-charge-you-1c4b" class="crayons-story__hidden-navigation-link"&gt;Claude Opus 4.7 Hit 87.6% on SWE-bench. The Story Is What It Didn't Charge You.&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/ji_ai" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png" alt="ji_ai profile" class="crayons-avatar__image" width="400" height="400"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/ji_ai" class="crayons-story__secondary fw-medium m:hidden"&gt;
              jidonglab
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                jidonglab
                
              
              &lt;div id="story-author-preview-content-3527843" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/ji_ai" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png" class="crayons-avatar__image" alt="" width="400" height="400"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;jidonglab&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/ji_ai/claude-opus-47-hit-876-on-swe-bench-the-story-is-what-it-didnt-charge-you-1c4b" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 20&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/ji_ai/claude-opus-47-hit-876-on-swe-bench-the-story-is-what-it-didnt-charge-you-1c4b" id="article-link-3527843"&gt;
          Claude Opus 4.7 Hit 87.6% on SWE-bench. The Story Is What It Didn't Charge You.
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/claude"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;claude&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/llm"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;llm&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/engineering"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;engineering&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/ji_ai/claude-opus-47-hit-876-on-swe-bench-the-story-is-what-it-didnt-charge-you-1c4b#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            7 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


&lt;h2&gt;
  
  
  The clause Anthropic refused to sign
&lt;/h2&gt;

&lt;p&gt;In 2025, the DoD started requiring AI vendors to accept an "all lawful purposes" clause. The language gives the Pentagon broad discretion to use the AI for any legally authorized purpose — including lethal autonomous systems support, targeting, and weapons platform integration. It does not mandate those uses. It just does not exclude them. It is a blank check on usage rights.&lt;/p&gt;

&lt;p&gt;Anthropic refused. The company's Acceptable Use Policy prohibits using Claude to "cause serious harm to people" and restricts autonomous weapons applications. Signing the clause would have effectively voided that policy — the exact thing Anthropic uses to differentiate itself from OpenAI. So early in 2026, the DoD formally classified Anthropic as a "supply chain risk." That label is the procurement equivalent of being declared a national security threat, except it did not come from a technical vulnerability or counterintelligence finding. It came from a procurement office that decided a vendor who might refuse to support a specific use case mid-contract is operationally unreliable.&lt;/p&gt;

&lt;p&gt;Anthropic responded with a federal lawsuit in March, arguing the classification is arbitrary and amounts to a permanent ban without due process. The lawsuit is active. The DoD signed the eight-company contracts anyway. Compressed timeline: OpenAI dropped its military prohibition in January 2024; the "all lawful purposes" clause appeared in 2025; Anthropic refused in late 2025; the "supply chain risk" classification landed in early 2026; the lawsuit was filed in March; Axios reported the NSA was already running Anthropic's Mythos cyber-defense model on April 19; the IL6/IL7 contracts went out without Anthropic on May 1.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$15B  ── annual U.S. defense AI spend Anthropic just walked away from
$900B ── valuation Anthropic might land at next month
─────────────────────────────────────────────────────────────────────
The first AI company to refuse the Pentagon's terms is also the
most expensive one. That's not a coincidence. That's the trade.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  The Mythos paradox makes the lawsuit interesting
&lt;/h2&gt;

&lt;p&gt;The strangest part of this story is that within the same federal government, the NSA — an agency that operates at the highest classification levels — is actively using Anthropic's Mythos Preview model for network intrusion detection and threat analysis. Pentagon CTO Emil Michael addressed the contradiction in a May 1 CNBC interview with one sentence: "The blacklist holds, but Mythos is a separate issue."&lt;/p&gt;

&lt;p&gt;Translate that. The DoD will not accept Anthropic as a full partner, but it will make exceptions when a specific Anthropic tool is too good to ignore. That position is operationally rational and legally fragile. You cannot simultaneously argue that a company is a supply chain risk to national security while running its model inside your signals intelligence agency. Well — you can argue it, Emil Michael just did, but it is not a position that holds up under judicial scrutiny.&lt;/p&gt;

&lt;p&gt;This matters because Anthropic's lawsuit now has unusually clean evidence. If a federal court rules the "supply chain risk" classification was arbitrary, the DoD may be forced to revisit the contract structure and the precedent will reach far beyond Anthropic. It would set a limit on how governments can condition AI procurement on usage terms.&lt;/p&gt;


&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/ji_ai/stellantis-just-outsourced-its-ai-moat-to-microsoft-expect-gm-ford-and-vw-to-follow-578" class="crayons-story__hidden-navigation-link"&gt;Stellantis Just Outsourced Its AI Moat to Microsoft. Expect GM, Ford, and VW to Follow.&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/ji_ai" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png" alt="ji_ai profile" class="crayons-avatar__image" width="400" height="400"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/ji_ai" class="crayons-story__secondary fw-medium m:hidden"&gt;
              jidonglab
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                jidonglab
                
              
              &lt;div id="story-author-preview-content-3531246" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/ji_ai" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png" class="crayons-avatar__image" alt="" width="400" height="400"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;jidonglab&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/ji_ai/stellantis-just-outsourced-its-ai-moat-to-microsoft-expect-gm-ford-and-vw-to-follow-578" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 21&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/ji_ai/stellantis-just-outsourced-its-ai-moat-to-microsoft-expect-gm-ford-and-vw-to-follow-578" id="article-link-3531246"&gt;
          Stellantis Just Outsourced Its AI Moat to Microsoft. Expect GM, Ford, and VW to Follow.
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/automotive"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;automotive&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/microsoft"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;microsoft&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/enterprise"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;enterprise&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/ji_ai/stellantis-just-outsourced-its-ai-moat-to-microsoft-expect-gm-ford-and-vw-to-follow-578#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            8 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;/div&gt;
&lt;br&gt;


&lt;h2&gt;
  
  
  What this means if you are building on Claude
&lt;/h2&gt;

&lt;p&gt;Three trade-offs are locking in at once. First, frontier AI procurement is now a usage-rights negotiation, not a capability evaluation. The Pentagon did not benchmark Claude against GPT and Gemini and decide Claude lost. It did not evaluate at all. The selection ran on contract acceptance. If you plan to sell into government, design your AUP knowing the "all lawful purposes" clause exists and pick a side before customers force you to. Retrofitting later is what got Anthropic into court.&lt;/p&gt;

&lt;p&gt;Second, AI safety as a brand has become measurably more expensive. Anthropic's stance might still be the right long-term bet, but the short-term cost is a $15B/year market and a credibility hit at the worst possible moment in its fundraising cycle. The counter-bet is that this stance compounds into enterprise trust in regulated industries — healthcare, finance, EU public sector — that the contract-signers cannot match. Both bets are live.&lt;/p&gt;

&lt;p&gt;Third, the trade-off developers feel first. If your project depends on the Claude API and your work touches federal, defense, or adjacent regulated domains, the ground has shifted. Commercial API access is unaffected today. But the "supply chain risk" label travels — once a federal agency uses it, primes start asking whether their own vendors are exposed. Build a contingency plan: know your migration path to the closest non-Anthropic substitute and price dual-vendor architecture before procurement asks.&lt;/p&gt;

&lt;p&gt;If you want a way into the defense AI market itself, the door is more open than six months ago. Reflection AI's inclusion proves you do not even need a shipping product. You do need FedRAMP High, DISA STIGs, and the architecture differences between commercial cloud and Azure Government / AWS GovCloud — that experience is in genuinely short supply.&lt;/p&gt;


&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/ji_ai/symphony-why-openais-prs-jumped-500-in-3-weeks-5b9i" class="crayons-story__hidden-navigation-link"&gt;Symphony: Why OpenAI's PRs Jumped 500% in 3 Weeks&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/ji_ai" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png" alt="ji_ai profile" class="crayons-avatar__image" width="400" height="400"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/ji_ai" class="crayons-story__secondary fw-medium m:hidden"&gt;
              jidonglab
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                jidonglab
                
              
              &lt;div id="story-author-preview-content-3592373" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/ji_ai" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png" class="crayons-avatar__image" alt="" width="400" height="400"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;jidonglab&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/ji_ai/symphony-why-openais-prs-jumped-500-in-3-weeks-5b9i" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 30&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/ji_ai/symphony-why-openais-prs-jumped-500-in-3-weeks-5b9i" id="article-link-3592373"&gt;
          Symphony: Why OpenAI's PRs Jumped 500% in 3 Weeks
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/ji_ai/symphony-why-openais-prs-jumped-500-in-3-weeks-5b9i#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            5 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


&lt;h2&gt;
  
  
  So who is actually right here
&lt;/h2&gt;

&lt;p&gt;Both sides have a coherent argument and that is exactly why this is hard. Defense hawks see Anthropic as naive: if you want government contracts, you play by government rules. OpenAI, Google, and Microsoft understood that. Anthropic chose principle over pragmatism and the bill came due.&lt;/p&gt;

&lt;p&gt;AI safety researchers see the same situation as a stress test of whether responsible AI development means anything once the customer is the most powerful military on earth. If a frontier lab folds the moment the Pentagon offers a check, the entire concept of an Acceptable Use Policy becomes performative.&lt;/p&gt;

&lt;p&gt;The Pentagon is operationally rational. Anthropic is philosophically coherent. This is not a case where one side is clearly wrong — it is a structural tension the industry has not resolved. History is kind to companies that held lines, but surviving as a $900B company without your own government's trust is harder than it looks.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The first AI vendor to refuse the Pentagon's contract terms is also the most expensive one — that is the new shape of AI sovereignty.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you are shipping production workloads on Claude in a regulated industry, does Anthropic's Pentagon stance change your bet — or does it become the reason you double down?&lt;/p&gt;

&lt;p&gt;Full Korean analysis on &lt;a href="https://spoonai.me/posts/2026-05-02-pentagon-anthropic-blacklist-confirmed-ko" rel="noopener noreferrer"&gt;spoonai.me&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://defensescoop.com/2026/05/01/dod-expands-classified-ai-work-with-8-companies-excluding-anthropic/" rel="noopener noreferrer"&gt;DefenseScoop&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cnbc.com/2026/05/01/pentagon-anthropic-blacklist-mythos-michael.html" rel="noopener noreferrer"&gt;CNBC&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://breakingdefense.com/2026/05/pentagon-clears-7-tech-firms-to-deploy-their-ai-on-its-classified-networks/" rel="noopener noreferrer"&gt;Breaking Defense&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.defensenews.com/news/pentagon-congress/2026/05/01/pentagon-freezes-out-anthropic-as-it-signs-deals-with-ai-rivals/" rel="noopener noreferrer"&gt;Defense News&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>news</category>
      <category>business</category>
    </item>
    <item>
      <title>Anthropic $900B: 2.4x in 90 Days, 48-Hour Window</title>
      <dc:creator>jidonglab</dc:creator>
      <pubDate>Sun, 03 May 2026 15:52:38 +0000</pubDate>
      <link>https://forem.com/ji_ai/anthropic-900b-24x-in-90-days-48-hour-window-361l</link>
      <guid>https://forem.com/ji_ai/anthropic-900b-24x-in-90-days-48-hour-window-361l</guid>
      <description>&lt;p&gt;Three months ago, Anthropic was valued at $380B. Today the number on the table is $900B. That is a 2.4x jump in 90 days, and the round closes in two weeks.&lt;/p&gt;

&lt;p&gt;Anthropic is the AI lab behind Claude, currently raising a $50B round at a $900B valuation that would make it the most valuable AI startup ever. Bloomberg broke the story on April 29. CNBC confirmed the same day. By May 1, TechCrunch, PYMNTS, and Reuters had all piled on. The question stopped being "will it close" and became "what does the market look like the morning after."&lt;/p&gt;

&lt;p&gt;I've been tracking Anthropic's cap table for two years and the velocity here is what I keep getting stuck on. Not the absolute number. The slope.&lt;/p&gt;

&lt;h2&gt;
  
  
  2.4x in 90 Days
&lt;/h2&gt;

&lt;p&gt;Let me put the curve on the table, because the trajectory is the story.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Valuation&lt;/th&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sep 2024&lt;/td&gt;
&lt;td&gt;$180B&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mar 2025&lt;/td&gt;
&lt;td&gt;$610B&lt;/td&gt;
&lt;td&gt;3.4x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feb 2026&lt;/td&gt;
&lt;td&gt;$380B&lt;/td&gt;
&lt;td&gt;0.6x (correction)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;May 2026&lt;/td&gt;
&lt;td&gt;$900B&lt;/td&gt;
&lt;td&gt;2.4x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That February dip is the part most coverage misses. Anthropic actually got marked down during the Q1 correction, then more than doubled in a single quarter. SpaceX's fastest historical climb did not hit 2.4x in a quarter. There is essentially no precedent for this slope at this scale.&lt;/p&gt;

&lt;p&gt;The investor pool tells you why. Existing backers like Google, Salesforce, and Spark Capital are reupping, and a wave of new institutionals are clamoring to get an allocation. A $50B round with 48-hour allocation windows is not a fundraise — it is a queue management problem.&lt;/p&gt;


&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/ji_ai/claude-opus-47-hit-876-on-swe-bench-the-story-is-what-it-didnt-charge-you-1c4b" class="crayons-story__hidden-navigation-link"&gt;Claude Opus 4.7 Hit 87.6% on SWE-bench. The Story Is What It Didn't Charge You.&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/ji_ai" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png" alt="ji_ai profile" class="crayons-avatar__image" width="400" height="400"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/ji_ai" class="crayons-story__secondary fw-medium m:hidden"&gt;
              jidonglab
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                jidonglab
                
              
              &lt;div id="story-author-preview-content-3527843" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/ji_ai" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png" class="crayons-avatar__image" alt="" width="400" height="400"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;jidonglab&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/ji_ai/claude-opus-47-hit-876-on-swe-bench-the-story-is-what-it-didnt-charge-you-1c4b" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 20&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/ji_ai/claude-opus-47-hit-876-on-swe-bench-the-story-is-what-it-didnt-charge-you-1c4b" id="article-link-3527843"&gt;
          Claude Opus 4.7 Hit 87.6% on SWE-bench. The Story Is What It Didn't Charge You.
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/claude"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;claude&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/llm"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;llm&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/engineering"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;engineering&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/ji_ai/claude-opus-47-hit-876-on-swe-bench-the-story-is-what-it-didnt-charge-you-1c4b#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            7 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


&lt;h2&gt;
  
  
  The 48-Hour Math
&lt;/h2&gt;

&lt;p&gt;Here is the deal mechanic that has term-sheet veterans raising eyebrows.&lt;/p&gt;

&lt;p&gt;Anthropic's board is expected to approve in May. The target close is two weeks after that. Inside that window, investors who get an allocation have 48 hours to commit. Forty-eight hours to decide whether you want a slice of a $900B private company.&lt;/p&gt;

&lt;p&gt;In normal markets, a round this size takes months to syndicate. Lawyers redline. LPs ask questions. Diligence drags. A 48-hour window means Anthropic is not negotiating — it is rationing. Demand exceeds the round size by a margin large enough that the company can dictate terms most founders would never get away with.&lt;/p&gt;

&lt;p&gt;The structural bet behind that confidence is the revenue curve. Anthropic ended 2025 at roughly $9B ARR. By end of March 2026, they were at $30B. That is 3.3x in four months. Roughly 80% of it is enterprise, with over 1,000 customers spending $1M+ per year. This is not consumer subscription churn — it is committed annual contracts from companies whose procurement teams already justified the spend.&lt;/p&gt;

&lt;p&gt;At $30B ARR, the implied multiple is about 30x. High, but not absurd at 200%+ growth. Whether "not absurd" survives the next two earnings cycles is the live question.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Number That Actually Matters
&lt;/h2&gt;

&lt;p&gt;Compare the two giants on the metric that determines whether $900B holds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenAI    ARR: ~$130B   -&amp;gt;   Valuation $852B   (~6.5x)
Anthropic ARR: ~$30B    -&amp;gt;   Valuation $900B   (~30x)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;OpenAI has 4x the revenue, and Anthropic is now priced higher. That gap is not irrational on its own — it is a bet on the second derivative. If Anthropic's growth rate persists for another year, the multiples converge. If it decelerates, the gap looks like a pricing error.&lt;/p&gt;

&lt;p&gt;The composition of the gap also matters. OpenAI built breadth — hundreds of millions of consumer users, ChatGPT as a brand, an API platform. Anthropic built depth — fewer logos, bigger contracts, a "safe AI" wrapper that lets enterprise procurement teams sign without writing a memo defending the choice. In 2026, the second story is the one investors are paying for.&lt;/p&gt;

&lt;p&gt;There is also a structural advantage I keep underweighting. Amazon has committed $25B and 5GW of compute. Google has committed $40B. Both clouds host Claude. Unlike OpenAI's deep dependence on Microsoft, Anthropic plays both sides of the cloud duopoly without locking into either. That optionality is worth a premium, and the term sheet reflects it.&lt;/p&gt;

&lt;p&gt;Fortune ran an analysis the same week claiming roughly half of Google's and Amazon's AI-related profits trace back to Anthropic stake appreciation. Read in reverse: big tech AI profitability now depends materially on Anthropic's mark going up. That is a flywheel — or a feedback loop, depending on which side of the cycle you think we are on.&lt;/p&gt;


&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/ji_ai/why-openai-shipped-gpt-55-just-6-weeks-after-54-270c" class="crayons-story__hidden-navigation-link"&gt;Why OpenAI Shipped GPT-5.5 Just 6 Weeks After 5.4&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/ji_ai" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png" alt="ji_ai profile" class="crayons-avatar__image" width="400" height="400"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/ji_ai" class="crayons-story__secondary fw-medium m:hidden"&gt;
              jidonglab
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                jidonglab
                
              
              &lt;div id="story-author-preview-content-3543507" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/ji_ai" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png" class="crayons-avatar__image" alt="" width="400" height="400"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;jidonglab&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/ji_ai/why-openai-shipped-gpt-55-just-6-weeks-after-54-270c" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 24&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/ji_ai/why-openai-shipped-gpt-55-just-6-weeks-after-54-270c" id="article-link-3543507"&gt;
          Why OpenAI Shipped GPT-5.5 Just 6 Weeks After 5.4
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/openai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;openai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/webdev"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;webdev&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/productivity"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;productivity&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/ji_ai/why-openai-shipped-gpt-55-just-6-weeks-after-54-270c#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            8 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;/div&gt;
&lt;br&gt;


&lt;h2&gt;
  
  
  The Pentagon Paradox
&lt;/h2&gt;

&lt;p&gt;The same week the $900B headlines hit, the Pentagon excluded Anthropic from its AI contract shortlist. Most companies would call that a bad week. For Anthropic, the two stories are arguably the same story.&lt;/p&gt;

&lt;p&gt;Anthropic has held a cautious posture on military AI since founding. That clashes with Pentagon procurement. But it is exactly the posture that lets a bank, a hospital, or a law firm justify the contract to a risk committee. "This is the AI lab that turned down Pentagon money because it takes safety seriously" is a powerful procurement narrative in regulated verticals — and reports indicate those verticals are Anthropic's fastest-growing segments.&lt;/p&gt;

&lt;p&gt;Reddit summarized it as "Pentagon snub, cap table revenge." That is glib, but directionally correct. The snub and the $900B are two faces of the same brand strategy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bubble or Breakout?
&lt;/h2&gt;

&lt;p&gt;If $900B feels like a stretch, the instinct is not unfounded. The bear case writes itself: 30x revenue, 200% growth that has to persist, open-source models like Llama and DeepSeek closing the gap on price-per-token every quarter. Pre-2000 dot-com peaks looked structurally similar.&lt;/p&gt;

&lt;p&gt;The difference, and it is a real one, is that Anthropic has $30B in ARR with 80% recurring. This is not a dream with no revenue. The bubble, if it is one, is not empty. That does not eliminate risk — it just reframes it from "is there a business" to "is the growth rate priced correctly."&lt;/p&gt;

&lt;p&gt;What this means depends on where you sit. If you are a founder fundraising right now, the "AI valuations are still climbing" narrative works in your favor, but investor attention is concentrating at the very top of the stack — your differentiation has to be sharper than it was in 2024. If you are a developer building on Claude, expect API price drops as Anthropic deploys this capital, and revisit your platform dependency before the next major release locks in your stack. If you are an investor staring at a $900B entry, the only metric that matters is whether ARR growth holds for two more quarters; any deceleration breaks the multiple. And if you are inside a big tech AI org, this round is the starting gun for "phase two" — the axis is now which frontier model your cloud can offer with structural advantage, not whether you have one of your own.&lt;/p&gt;

&lt;p&gt;The IPO timeline is the next domino. Market chatter places the earliest window at October 2026, with H1 2027 as the realistic case. Dario Amodei has said there is "no rush." That can mean two things: the company needs maturity, or when you can raise $50B privately at $900B, IPO urgency simply evaporates. Both are probably true.&lt;/p&gt;


&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/ji_ai/symphony-why-openais-prs-jumped-500-in-3-weeks-5b9i" class="crayons-story__hidden-navigation-link"&gt;Symphony: Why OpenAI's PRs Jumped 500% in 3 Weeks&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/ji_ai" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png" alt="ji_ai profile" class="crayons-avatar__image" width="400" height="400"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/ji_ai" class="crayons-story__secondary fw-medium m:hidden"&gt;
              jidonglab
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                jidonglab
                
              
              &lt;div id="story-author-preview-content-3592373" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/ji_ai" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791767%2F6eb19afc-a99c-4736-9d12-459108893a16.png" class="crayons-avatar__image" alt="" width="400" height="400"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;jidonglab&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/ji_ai/symphony-why-openais-prs-jumped-500-in-3-weeks-5b9i" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 30&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/ji_ai/symphony-why-openais-prs-jumped-500-in-3-weeks-5b9i" id="article-link-3592373"&gt;
          Symphony: Why OpenAI's PRs Jumped 500% in 3 Weeks
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/ji_ai/symphony-why-openais-prs-jumped-500-in-3-weeks-5b9i#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            5 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;A $900B private valuation isn't a fundraise — it's a bet that the AI revenue curve doesn't bend before the IPO does.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you had a 48-hour allocation window at $900B and $30B ARR, would you write the check?&lt;/p&gt;

&lt;p&gt;Full Korean analysis on &lt;a href="https://spoonai.me/posts/2026-05-02-anthropic-900b-valuation-48h-deadline-ko" rel="noopener noreferrer"&gt;spoonai.me&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.bloomberg.com/news/articles/2026-04-29/anthropic-considering-funding-offers-at-over-900-billion-value" rel="noopener noreferrer"&gt;Bloomberg&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cnbc.com/2026/04/29/anthropic-weighs-raising-funds-at-900b-valuation-topping-openai.html" rel="noopener noreferrer"&gt;CNBC&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/04/29/sources-anthropic-could-raise-a-new-50b-round-at-a-valuation-of-900b/" rel="noopener noreferrer"&gt;TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.pymnts.com/artificial-intelligence-2/2026/anthropic-weighs-funding-round-at-valuation-above-900-billion/" rel="noopener noreferrer"&gt;PYMNTS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>business</category>
      <category>news</category>
      <category>startup</category>
    </item>
    <item>
      <title>Symphony: Why OpenAI's PRs Jumped 500% in 3 Weeks</title>
      <dc:creator>jidonglab</dc:creator>
      <pubDate>Thu, 30 Apr 2026 15:01:14 +0000</pubDate>
      <link>https://forem.com/ji_ai/symphony-why-openais-prs-jumped-500-in-3-weeks-5b9i</link>
      <guid>https://forem.com/ji_ai/symphony-why-openais-prs-jumped-500-in-3-weeks-5b9i</guid>
      <description>&lt;p&gt;OpenAI's internal teams landed five times more pull requests in the three weeks after they switched on &lt;a href="https://github.com/openai/symphony" rel="noopener noreferrer"&gt;Symphony&lt;/a&gt;. Not 50% more. Five hundred percent more, on the same headcount, in 21 days. That single number is why I cloned the repo the day it dropped.&lt;/p&gt;

&lt;p&gt;Symphony is OpenAI's open-source orchestration layer that turns a &lt;a href="https://linear.app" rel="noopener noreferrer"&gt;Linear&lt;/a&gt; board into a control plane for coding agents, released April 28, 2026 as a reference implementation, not a maintained product. It is small — a few thousand lines of TypeScript wrapped around the &lt;a href="https://openai.com/index/unlocking-the-codex-harness/" rel="noopener noreferrer"&gt;Codex App Server&lt;/a&gt; — and it is deliberately opinionated. The core idea is so blunt it almost feels like a prank: stop supervising agents, manage tickets instead.&lt;/p&gt;

&lt;p&gt;To understand why that idea is worth open-sourcing, you have to talk about the supervision tax. Anyone who has run a coding agent in anger knows the rhythm. You hand it a task, babysit the diff, nudge it when it loses the plot, re-prompt when it crashes, remember which terminal tab had the half-finished branch. By the time you have shepherded one PR to merge, half the day is gone. The tax is not the model's failure rate. It is the human attention each running agent demands. Multiply by three or four parallel agents and you stop being an engineer and start being a kindergarten teacher with a Slack window.&lt;/p&gt;

&lt;p&gt;Symphony's pitch is that the kindergarten part is automatable. The board is already the queue — every Linear team has a backlog with assignees, labels, and acceptance criteria. Symphony reads that board on a poll, takes any ticket marked for an agent, spawns a dedicated workspace, runs the agent until it produces a PR, and links the PR back to the ticket. If the agent crashes mid-run, Symphony notices the dead process and restarts it on the same ticket. The human's job collapses to two verbs: write the ticket, review the PR.&lt;/p&gt;

&lt;p&gt;Here is the loop, drawn out so you can see the shape of it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   ┌──────────────┐    poll(30s)    ┌──────────────┐
   │ Linear board │ ───────────────▶│   Symphony   │
   │  (tickets)   │                 │   poller     │
   └──────────────┘                 └──────┬───────┘
          ▲                                │ spawn
          │ comment + PR link              ▼
   ┌──────┴───────┐                 ┌──────────────┐
   │  Pull req    │◀────── push ────│ agent worktree│
   │  on GitHub   │                 │  (Codex/Kata) │
   └──────────────┘                 └──────┬───────┘
                                           │ crash?
                                           ▼
                                    restart same task
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The actual board read is unsurprising once you see it. Symphony's poller is essentially this, give or take some retry logic.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tickets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;linear&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Agent Ready&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;tickets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;workspaces&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="nf"&gt;spawnAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That is the whole control plane in spirit. A label on a ticket — &lt;code&gt;Agent Ready&lt;/code&gt; in the default config — is the signal. Symphony walks the list, checks which IDs already have a live workspace, and spawns one for any that does not. No scheduler, no priority queue, no fairness algorithm. The board is the source of truth, and the poller is dumb on purpose. Change the status to &lt;code&gt;In Review&lt;/code&gt; and Symphony stops handing it to the agent. When the agent opens a PR, it comments back on the Linear issue with the link, and the loop closes.&lt;/p&gt;

&lt;p&gt;The piece I found genuinely surprising is the crash handling. Each ticket gets a worktree, each worktree gets a long-running Codex App Server session, and if the session dies Symphony restarts it on the same task with scratch state preserved on disk. That sounds boring until you realize it is exactly the property that lets you walk away. Most ad-hoc agent setups treat a crash as a failure the human has to triage. Symphony treats it like a Kubernetes pod restart — the agent comes back, reads its worktree and the ticket, and keeps going.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/openai" rel="noopener noreferrer"&gt;
        openai
      &lt;/a&gt; / &lt;a href="https://github.com/openai/symphony" rel="noopener noreferrer"&gt;
        symphony
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Symphony turns project work into isolated, autonomous implementation runs, allowing teams to manage work instead of supervising coding agents.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Symphony&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;Symphony turns project work into isolated, autonomous implementation runs, allowing teams to manage
work instead of supervising coding agents.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/openai/symphony/.github/media/symphony-demo.mp4" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fopenai%2Fsymphony%2FHEAD%2F.github%2Fmedia%2Fsymphony-demo-poster.jpg" alt="Symphony demo video preview"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;In this &lt;a href="https://github.com/openai/symphony/.github/media/symphony-demo.mp4" rel="noopener noreferrer"&gt;demo video&lt;/a&gt;, Symphony monitors a Linear board for work and spawns agents to handle the tasks. The agents complete the tasks and provide proof of work: CI status, PR review feedback, complexity analysis, and walkthrough videos. When accepted, the agents land the PR safely. Engineers do not need to supervise Codex; they can manage the work at a higher level.&lt;/em&gt;&lt;/p&gt;
&lt;div class="markdown-alert markdown-alert-warning"&gt;
&lt;p class="markdown-alert-title"&gt;Warning&lt;/p&gt;
&lt;p&gt;Symphony is a low-key engineering preview for testing in trusted environments.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Running Symphony&lt;/h2&gt;
&lt;/div&gt;
&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Requirements&lt;/h3&gt;
&lt;/div&gt;
&lt;p&gt;Symphony works best in codebases that have adopted
&lt;a href="https://openai.com/index/harness-engineering/" rel="nofollow noopener noreferrer"&gt;harness engineering&lt;/a&gt;. Symphony is the next step --
moving from managing coding agents to managing work that needs to get done.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Option 1. Make your own&lt;/h3&gt;

&lt;/div&gt;
&lt;p&gt;Tell your favorite coding agent to build Symphony in a programming language of your choice:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Implement Symphony…&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/openai/symphony" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Then v1.1.0 shipped on the heels of the launch and the project stopped being an OpenAI thing. v1.1 added support for &lt;a href="https://github.com/openai/symphony" rel="noopener noreferrer"&gt;Kata CLI&lt;/a&gt; — based on the open-source pi-coding-agent harness — which means Symphony is now model-agnostic. Point a workspace at &lt;a href="https://www.anthropic.com/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, at &lt;a href="https://deepmind.google/technologies/gemini/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;, at any CLI that speaks the Kata protocol, and the orchestrator does not care. The ticket flows the same way, crash recovery works the same way, and the PR comes back through the same Linear comment hook. For OpenAI, this is generous. For everyone running a non-Codex stack, this is the real headline.&lt;/p&gt;

&lt;p&gt;The natural question is why this needs to exist when the alternatives are so visible. &lt;a href="https://openai.com/index/introducing-codex/" rel="noopener noreferrer"&gt;Codex Cloud&lt;/a&gt; lets you fan out tasks from chat, a GitHub Actions matrix can fan out from a labeled issue, and a custom Redis-queue orchestrator takes a weekend to build. I have shipped versions of all three. Codex Cloud is excellent for one-off bursts, but it does not own a backlog — every task is something you initiated in a chat, so you are still feeding the queue. Actions matrices are great for parallelism, but the unit of work is a workflow run, not a long-lived agent that survives across runs; the moment a job exceeds 6 hours or needs to ask a question, the abstraction snaps. Custom orchestration solves both, but you rebuild ticket state, worktree management, restart logic, and PR linkage from scratch, and the bus factor is one. Symphony's contribution is not novel infrastructure. It is a reference shape — board, poller, workspace, PR, with crashes as a non-event — small enough to fork and opinionated enough to copy.&lt;/p&gt;

&lt;p&gt;Now the 500% number, honestly. The figure comes from OpenAI's own &lt;a href="https://openai.com/index/open-source-codex-orchestration-symphony/" rel="noopener noreferrer"&gt;launch post&lt;/a&gt; and refers to internal teams measuring landed PRs across roughly three weeks of Symphony usage versus their pre-Symphony baseline. That is a real measurement, but it deserves asterisks. The engineers were already deep Codex users with strong ticket hygiene — not the average shop. Three weeks is not long enough to wash out novelty effects, and "landed PRs" rewards small mergeable diffs, which agents happen to be good at. None of this means the number is wrong. It means it is the upper bound, and your team will probably see something smaller, with most of the first month going into ticket-writing discipline rather than code.&lt;/p&gt;

&lt;p&gt;My read on why it works, though, is unrelated to model quality. Symphony forces a separation most agent setups blur. The ticket is the spec, the agent is the executor, the human is the reviewer. Once those three roles are pinned to three surfaces — Linear, the worktree, GitHub — context-switching friction collapses. You stop wondering what an agent is doing because Linear answers that. You stop tab-hunting because each ticket has its own workspace. The supervision tax does not vanish, but it moves from continuous to event-driven, and event-driven attention is a different mode entirely. That is the part you cannot buy with a faster model.&lt;/p&gt;

&lt;p&gt;What would actually convince you to run this on a real team — your existing backlog, your reviewers, your model of choice — and not just a toy repo on a Sunday?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The board is the queue, the agents are workers, and the only thing left for me to do is write the ticket and merge the PR.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Sources: &lt;a href="https://openai.com/index/open-source-codex-orchestration-symphony/" rel="noopener noreferrer"&gt;OpenAI launch post&lt;/a&gt; · &lt;a href="https://github.com/openai/symphony" rel="noopener noreferrer"&gt;Symphony repository&lt;/a&gt; · &lt;a href="https://www.helpnetsecurity.com/2026/04/28/openai-symphony-codex-orchestration-linear/" rel="noopener noreferrer"&gt;Help Net Security&lt;/a&gt; · &lt;a href="https://www.infoworld.com/article/4164173/openais-symphony-spec-pushes-coding-agents-from-prompts-to-orchestration.html" rel="noopener noreferrer"&gt;InfoWorld&lt;/a&gt; · &lt;a href="https://openai.com/index/unlocking-the-codex-harness/" rel="noopener noreferrer"&gt;Codex harness internals&lt;/a&gt; · &lt;a href="https://developers.openai.com/codex/changelog" rel="noopener noreferrer"&gt;Codex changelog&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>GPT Image 2 Inside Codex: My New Frontend Workflow</title>
      <dc:creator>jidonglab</dc:creator>
      <pubDate>Thu, 30 Apr 2026 15:00:38 +0000</pubDate>
      <link>https://forem.com/ji_ai/gpt-image-2-inside-codex-my-new-frontend-workflow-4d7n</link>
      <guid>https://forem.com/ji_ai/gpt-image-2-inside-codex-my-new-frontend-workflow-4d7n</guid>
      <description>&lt;p&gt;Last quarter I shipped a single landing hero with 47 image iterations across four tools. This week I shipped three landing pages, two onboarding flows, and a full pricing section in the same tool I write code in. The thing that broke the loop is not faster pixels, it is reasoning before pixels.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developers.openai.com/api/docs/models/gpt-image-2" rel="noopener noreferrer"&gt;GPT Image 2&lt;/a&gt; is OpenAI's April 21, 2026 image model that runs inside the same O-series reasoning loop as the rest of Codex, accepts up to 16 reference images, and renders natively at 1K, 2K, and 4K. The pinned snapshot is &lt;code&gt;gpt-image-2-2026-04-21&lt;/code&gt;. It ships in three places at once: &lt;a href="https://openai.com/index/introducing-chatgpt-images-2-0/" rel="noopener noreferrer"&gt;ChatGPT Images 2.0&lt;/a&gt; for consumers, the OpenAI API in early May, and as a first-class tool inside the &lt;a href="https://developers.openai.com/codex/changelog" rel="noopener noreferrer"&gt;Codex App and Codex CLI&lt;/a&gt;. &lt;a href="https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/introducing-openais-gpt-image-2-in-microsoft-foundry/4500571" rel="noopener noreferrer"&gt;Microsoft Foundry shipped it on day one&lt;/a&gt; too.&lt;/p&gt;

&lt;p&gt;The headline number for me is not resolution. It is iteration count. My average dropped from 47 to 6.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pain I built this around
&lt;/h2&gt;

&lt;p&gt;For two years my frontend loop was a relay race between four runners who kept dropping the baton. I would sketch in Figma, export a placeholder, write a Midjourney prompt, generate eight candidates, pick one, upscale, rename the file, drop it into &lt;code&gt;public/images/&lt;/code&gt;, wire it into React, push, look at staging, hate the crop, and start over. Each handoff lost context. The prompt did not know my brand palette. The React glue did not know which crop the designer wanted.&lt;/p&gt;

&lt;p&gt;The 47-iteration number is real. I counted on a single hero for a dental clinic in March. Most iterations were not artistic, they were logistical. Korean text rendered as garbled glyphs, so I overlaid in CSS. Hand anatomy was wrong, so I masked and redrew. Lighting did not match the reference, so I restarted. None of this was a creative choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed in Codex
&lt;/h2&gt;

&lt;p&gt;GPT Image 2 inside Codex collapses the relay into one runner. You describe the component in natural language inside the &lt;a href="https://openai.com/index/codex-for-almost-everything/" rel="noopener noreferrer"&gt;Codex App&lt;/a&gt;, the model researches the existing code and brand assets in your repo, plans the composition with O-series reasoning, renders at 4K, and the in-app browser opens the page so you can comment on the rendered DOM the same way you would in Figma. Codex re-renders. No file naming, no prompt copying, no tab switching.&lt;/p&gt;

&lt;p&gt;The reasoning step is what makes this feel different from gpt-image-1 or anything stitched together with Midjourney. The model writes a plan before it touches pixels. It checks whether the text in the image will be legible at the breakpoint you specified. It re-reads your &lt;code&gt;tailwind.config.ts&lt;/code&gt; to get the brand color hex. If you ask for a hero with a Korean tagline, it lays out the Hangul glyphs with near-perfect accuracy, and the same goes for Chinese and Japanese. That last part used to be the single biggest reason I kept text out of generated images.&lt;/p&gt;

&lt;p&gt;Here is the actual call from Codex CLI on a project I shipped Monday:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex image &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; gpt-image-2-2026-04-21 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--refs&lt;/span&gt; ./brand/&lt;span class="k"&gt;*&lt;/span&gt;.png &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--size&lt;/span&gt; 4096x2304 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--prompt&lt;/span&gt; &lt;span class="s2"&gt;"Hero for /pricing. Three-tier card layout, &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
            soft volumetric light, brand teal #0F766E, &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
            Korean tagline 합리적인 가격, 명확한 가치"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Eleven flags I used to juggle, gone. The model picks up brand references from the directory, infers the breakpoint from my Next.js routes, and writes alt text into the response. I drop the URL into &lt;code&gt;next/image&lt;/code&gt; and move on.&lt;/p&gt;
&lt;h2&gt;
  
  
  The 16-reference trick for brand consistency
&lt;/h2&gt;

&lt;p&gt;The single feature that paid for itself in week one is the 16-reference-image input. I used to keep a Notion page of "brand mood" images and paste links into Midjourney one at a time, hoping the style transferred. With Codex I drop a folder of 16 brand assets - past hero images, the logo, the photographer's portfolio shots, three Pinterest references, our typography specimen - and the model treats them as a single style anchor. Rendered images look like they came from the same shoot.&lt;/p&gt;

&lt;p&gt;The before/after on a real project tells the story:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                        Before (Midjourney + Figma)   After (Codex + gpt-image-2)
Iterations per hero     47                            6
Time to first ship      4.5 hours                     38 minutes
Brand match (1-10)      6                             9
Korean text accuracy    0% (overlaid in CSS)          ~98% (rendered native)
File handling steps     11                            0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The brand match score is subjective, but my client signed off on the first round for the first time in eight months of working together. That alone is worth the model.&lt;/p&gt;
&lt;h2&gt;
  
  
  How it stacks against the alternatives
&lt;/h2&gt;

&lt;p&gt;Midjourney is still better at moody artistic compositions when you do not care about brand. Flux 1.1 Pro Ultra is faster and slightly cheaper per render. The original gpt-image-1 was strong at instruction-following but capped at 1024x1024 and stumbled on multilingual text. None of them have reasoning before rendering and a tight loop with the codebase. Midjourney does not know &lt;code&gt;tailwind.config.ts&lt;/code&gt;. Flux does not open your staging URL. gpt-image-1 could not hold a 16-image style anchor without drift.&lt;/p&gt;

&lt;p&gt;If you have ever wired a Midjourney workflow into a real product you know the pain - I wrote up a related story about how I connected 20 different tools to my main coding agent in five minutes when the MCP ecosystem clicked, and the lesson translates directly. Tools that live inside your editor beat tools that live in another tab, every single time.&lt;/p&gt;


&lt;div class="crayons-card my-2 p-4"&gt;
  &lt;p class="color-base-60"&gt;Post not found or has been removed.&lt;/p&gt;
&lt;/div&gt;



&lt;h2&gt;
  
  
  What the in-app browser unlocks
&lt;/h2&gt;

&lt;p&gt;The Codex App's in-app browser is the part nobody talks about and the part that matters most for frontend work. After Codex renders an image and wires it into a component, the app opens a browser pane on the deployed page. You highlight the hero, type "headline is too tight against the model's shoulder, push left 80px and add 12% breathing room above the CTA," and Codex reads the comment as a Figma-style annotation. It re-renders the image, edits the JSX, and pushes a new build.&lt;/p&gt;

&lt;p&gt;This is the loop I have wanted for ten years. Comment on the rendered DOM, get a code change and an asset change in one commit. Because the comment lands in a real browser tab, accessibility tooling and computed styles are in scope. I caught a contrast failure on a hero this week because Codex ran an axe check after rendering and flagged a white-on-teal CTA at desktop breakpoint. The fix was an asset change, not a CSS change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Concrete numbers from one week
&lt;/h2&gt;

&lt;p&gt;I tracked every frontend task I shipped Monday to Friday. Three landing pages, two onboarding flows, a pricing section, eight blog covers, four email headers. Total render time was 4 hours 12 minutes, of which 2:41 was Codex thinking and rendering and 1:31 was me reviewing. The same volume in March took three full days plus a contractor. API spend was $34.18, more than Flux but less than one contractor invoice.&lt;/p&gt;

&lt;p&gt;What surprised me is how much time was spent not iterating. Six average iterations per asset means I trust the first or second render. That trust comes from the reasoning step. When the model tells you it will "place the product mockup at 60% from the left to balance the right-aligned headline and use a soft 4500K key light," you know what you are getting before the pixels exist. You correct the plan, not the pixels.&lt;/p&gt;

&lt;p&gt;If you do production frontend work and you have not tried gpt-image-2 inside Codex yet, the question worth asking is which step of your current image pipeline would survive a tool that thinks before it renders.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The win is not faster pixels, it is fewer pixels generated.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/introducing-chatgpt-images-2-0/" rel="noopener noreferrer"&gt;Introducing ChatGPT Images 2.0 (OpenAI)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/api/docs/models/gpt-image-2" rel="noopener noreferrer"&gt;gpt-image-2 model documentation (OpenAI Developers)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/changelog" rel="noopener noreferrer"&gt;Codex changelog (OpenAI Developers)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/codex-for-almost-everything/" rel="noopener noreferrer"&gt;Codex for (almost) everything (OpenAI)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/introducing-openais-gpt-image-2-in-microsoft-foundry/4500571" rel="noopener noreferrer"&gt;Introducing OpenAI's gpt-image-2 in Microsoft Foundry (Microsoft Tech Community)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>frontend</category>
      <category>openai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>GPT-5.5-Codex vs 5.3: A 200-Task Bench Result</title>
      <dc:creator>jidonglab</dc:creator>
      <pubDate>Thu, 30 Apr 2026 15:00:02 +0000</pubDate>
      <link>https://forem.com/ji_ai/gpt-55-codex-vs-53-a-200-task-bench-result-50ja</link>
      <guid>https://forem.com/ji_ai/gpt-55-codex-vs-53-a-200-task-bench-result-50ja</guid>
      <description>&lt;p&gt;On a 200-task bench split across a TypeScript SaaS and a Python ML pipeline, &lt;a href="https://developers.openai.com/codex/models" rel="noopener noreferrer"&gt;GPT-5.5-Codex&lt;/a&gt; closed 81% of tasks unattended versus 67% for &lt;a href="https://openai.com/index/introducing-gpt-5-3-codex/" rel="noopener noreferrer"&gt;GPT-5.3-Codex&lt;/a&gt;, and burned 38% fewer reasoning tokens on the multi-step ones. But on trivial single-file edits it was 22% slower wall-clock. The default-everything answer is wrong; the right answer is "route by complexity."&lt;/p&gt;

&lt;p&gt;GPT-5.5-Codex is OpenAI's coding-specialized variant of GPT-5.5, the new frontier model released April 2026 and now the recommended default inside &lt;a href="https://developers.openai.com/codex/" rel="noopener noreferrer"&gt;Codex&lt;/a&gt;. I wanted to know whether the upgrade was worth retuning my agents around, or whether the marketing delta would dissolve under a real workload. So I built a controlled bench and ran every task twice.&lt;/p&gt;

&lt;p&gt;The motivation: every model launch comes with a leaderboard chart and a vague "better at agentic coding" claim. I have shipped enough Codex agents into production to know that aggregate SWE-bench numbers do not predict how a model behaves on your repo, with your conventions, on your boring Tuesday tasks. OpenAI now positions GPT-5.4 as the flagship for general professional work and GPT-5.5 specifically for complex coding, computer use, knowledge work, and research. That positioning is interesting but not load-bearing. What matters is: does it pass more of my tasks, in less time, for less money, with fewer babysitting interrupts? You cannot answer that from a press release.&lt;/p&gt;

&lt;p&gt;The bench. I picked two repos I know cold. The first is a mid-size TypeScript SaaS — Next.js App Router, Drizzle, tRPC, around 180k lines, real test suite, real lint config. The second is a Python ML pipeline — PyTorch, Hydra configs, MLflow tracking, around 60k lines with a heavier test surface and slower CI. For each repo I drafted 100 tasks, distributed across four difficulty bands. Trivial: rename a function across 40 files, add a missing type, adjust a Tailwind class. Moderate: add a new tRPC procedure with input validation and a test. Hard: implement an OAuth flow with retry semantics and idempotency keys, then wire it through the existing session layer. Adversarial: reproduce and fix a flaky integration test with a real concurrency bug. Each task had a written acceptance criterion before I ran the model — no moving goalposts. Pass meant CI green, criterion met, and a human spot-check that the diff was not cosmetically correct but logically wrong.&lt;/p&gt;

&lt;p&gt;Each task ran with the same prompt, the same repo state (fresh &lt;code&gt;git worktree&lt;/code&gt; per run), and &lt;code&gt;codex exec&lt;/code&gt; in autonomous mode with a 30-minute ceiling. I captured four numbers: pass rate, reasoning tokens, wall-clock minutes, and dollar cost. The reasoning-token field is the interesting one — &lt;code&gt;codex exec --json&lt;/code&gt; now reports it, which is your real measurement hook for how hard the model "thought" before producing its diff. Here is the minimal harness I used to extract it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;--model&lt;/span&gt; gpt-5.5 &lt;span class="nt"&gt;--json&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--prompt-file&lt;/span&gt; tasks/oauth-retry.md &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repo&lt;/span&gt; ./saas-bench &lt;span class="se"&gt;\&lt;/span&gt;
| jq &lt;span class="s1"&gt;'{ pass: .result.success,
        reasoning_tokens: .usage.reasoning_tokens,
        wall_ms: .timing.total_ms,
        cost_usd: .usage.cost_usd }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;I ran the same harness against &lt;code&gt;--model gpt-5.3-codex&lt;/code&gt; for the comparison arm, logged every JSON line to DuckDB, and graded pass/fail by re-running the repo's CI inside a clean container. No human-in-the-loop nudges. If the model gave up, that was a fail.&lt;/p&gt;

&lt;p&gt;Before the numbers, the honest external context. I did not bench &lt;a href="https://www.anthropic.com/claude/sonnet" rel="noopener noreferrer"&gt;Claude Sonnet 4.6&lt;/a&gt; or &lt;a href="https://ai.google.dev/" rel="noopener noreferrer"&gt;Gemini Code&lt;/a&gt; head-to-head on the same 200 tasks because that would have tripled the runtime budget, but I ran both on a 30-task spot-check from the same pool. Sonnet 4.6 was within 3 points of 5.5 on pass rate and noticeably better at refusing to over-edit. Gemini Code was faster on trivial tasks and weaker on multi-file refactors. Treat the headline 5.5-vs-5.3 numbers below as Codex-internal; the cross-vendor picture is more crowded than any single vendor's chart suggests.&lt;/p&gt;

&lt;p&gt;Now the result. Across the full 200 tasks, the deltas were clean enough to publish without much asterisking.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;GPT-5.3-Codex&lt;/th&gt;
&lt;th&gt;GPT-5.5-Codex&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Overall pass rate&lt;/td&gt;
&lt;td&gt;67%&lt;/td&gt;
&lt;td&gt;81%&lt;/td&gt;
&lt;td&gt;+14pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hard-tier pass rate&lt;/td&gt;
&lt;td&gt;41%&lt;/td&gt;
&lt;td&gt;63%&lt;/td&gt;
&lt;td&gt;+22pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trivial-tier wall-clock (median)&lt;/td&gt;
&lt;td&gt;38s&lt;/td&gt;
&lt;td&gt;47s&lt;/td&gt;
&lt;td&gt;+22% slower&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning tokens, hard tasks (median)&lt;/td&gt;
&lt;td&gt;84k&lt;/td&gt;
&lt;td&gt;52k&lt;/td&gt;
&lt;td&gt;-38%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per passing task (mean)&lt;/td&gt;
&lt;td&gt;$0.41&lt;/td&gt;
&lt;td&gt;$0.36&lt;/td&gt;
&lt;td&gt;-12%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The shape of the win is not "smarter on everything." It is "much better at multi-step planning, slightly worse at being terse." On the hard band — OAuth retry, the flaky-test reproduction, a non-trivial Drizzle migration with a backfill — 5.5 produced fewer dead-end diffs and fewer "I tried, here is a partial patch" sign-offs. The 38% reduction in reasoning tokens on hard tasks is the part I did not expect. 5.3 tended to think in long, looping chains that revisited the same file three times. 5.5 plans first, then executes, and the trace shows it. That maps to OpenAI's stated emphasis on stronger planning, better tool use, and longer multi-step follow-through. Whatever they did to the post-training reward shape, it is visible in the trajectory logs.&lt;/p&gt;

&lt;p&gt;The cases where 5.5 loses. On trivial tasks — the ones a junior could finish in two minutes — 5.5 was consistently slower. Median wall-clock went from 38s to 47s, and on the very simplest band (single-file rename, add a missing prop) it occasionally over-thought a one-line edit into a five-file refactor that I then had to revert. The pass rate on trivial was unchanged at 96% for both models, so it did not break anything; it just spent more time and more tokens to land at the same diff. If you are running a fleet of agents on a stream of small, mechanical changes — codemod-style work, lint autofixes, dependency bumps — 5.3 is still the better default, and it is cheaper. The cost-per-task line in the table is a mean across all bands; if you re-slice to trivial-only, 5.3 wins on cost by about 18%.&lt;/p&gt;

&lt;p&gt;There was also one regression I want to flag honestly. On three of the Python ML tasks involving Hydra config composition, 5.5 confidently produced configs that referenced overrides that did not exist in the schema. 5.3 made the same class of error twice. Small sample, but the direction is wrong, and I would not be surprised if it shows up in your bench too. Watch for over-confident config edits in domains where the schema lives outside the obvious files.&lt;/p&gt;

&lt;p&gt;The operational takeaway. I am not setting 5.5 as my one-size default. I am routing by task complexity. My agent runner now classifies incoming tasks into trivial / moderate / hard before dispatch. Trivial goes to 5.3-Codex with a tight token budget. Moderate and hard go to 5.5-Codex with a larger ceiling. Cost dropped about 9% versus all-5.5, and pass rate held at 79% — within noise of the all-5.5 run. The router is fifty lines of code; the model toggle is one flag. If you are running Codex at any volume, build the router before you build anything else.&lt;/p&gt;


&lt;div class="crayons-card my-2 p-4"&gt;
  &lt;p class="color-base-60"&gt;Post not found or has been removed.&lt;/p&gt;
&lt;/div&gt;



&lt;p&gt;The pricing question. I am deliberately not citing dollar figures from memory — the &lt;a href="https://developers.openai.com/codex/pricing" rel="noopener noreferrer"&gt;Codex pricing page&lt;/a&gt; and the &lt;a href="https://developers.openai.com/codex/changelog" rel="noopener noreferrer"&gt;changelog&lt;/a&gt; move faster than blog posts do, and rate-limit policy on the Codex tier matters as much as the per-token rate for any real workload. Check both before you redo your cost model. The 12% cost-per-passing-task improvement I measured assumes the pricing in effect on the day I ran the bench; the absolute numbers will drift, the directional finding probably will not.&lt;/p&gt;

&lt;p&gt;What would change my mind. If I re-ran this bench in three months and found 5.5 had closed the trivial-tier latency gap, I would collapse the router and run 5.5 everywhere. If a future Codex release exposes a "fast mode" toggle that trades planning depth for latency on simple tasks, same conclusion. Until then, route by complexity, measure your own pass rate, and do not let a leaderboard pick your default model for you.&lt;/p&gt;

&lt;p&gt;Are you routing by task complexity, or letting the latest model eat your trivial-task latency budget?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The right question is never "which model is best." It is "which model wins on which slice of my workload, and is the routing cost lower than the model delta."&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-3-codex/" rel="noopener noreferrer"&gt;Introducing GPT-5.3-Codex (OpenAI)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/models" rel="noopener noreferrer"&gt;Codex Models reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/pricing" rel="noopener noreferrer"&gt;Codex Pricing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/changelog" rel="noopener noreferrer"&gt;Codex Changelog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lushbinary.com/blog/gpt-5-5-codex-autonomous-coding-agents-guide/" rel="noopener noreferrer"&gt;GPT-5.5-Codex autonomous coding agents guide (Lushbinary)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/" rel="noopener noreferrer"&gt;Codex CLI overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Codex Is No Longer a CLI. Embed It in Your App.</title>
      <dc:creator>jidonglab</dc:creator>
      <pubDate>Thu, 30 Apr 2026 14:59:27 +0000</pubDate>
      <link>https://forem.com/ji_ai/codex-is-no-longer-a-cli-embed-it-in-your-app-54i0</link>
      <guid>https://forem.com/ji_ai/codex-is-no-longer-a-cli-embed-it-in-your-app-54i0</guid>
      <description>&lt;p&gt;The interesting thing about the April 2026 &lt;a href="https://openai.com/index/codex-now-generally-available/" rel="noopener noreferrer"&gt;Codex&lt;/a&gt; update isn't computer use. It isn't the model bump either. The real story is that Codex stopped being a CLI.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://openai.com/index/unlocking-the-codex-harness/" rel="noopener noreferrer"&gt;Codex App Server&lt;/a&gt; is the underlying agent harness OpenAI just made public — the same engine that powers the official desktop app, now exposed as a first-class integration surface for anyone building on top of Codex. For most of last year, "using Codex" meant either typing into the &lt;a href="https://developers.openai.com/codex/cli" rel="noopener noreferrer"&gt;Codex CLI&lt;/a&gt; or living inside OpenAI's app. As of this month, that framing is wrong. The CLI is one client. The desktop app is another client. Your product can be a third, on equal footing, and OpenAI is openly recommending you treat the App Server as the integration target instead of wrapping the binary.&lt;/p&gt;

&lt;p&gt;I spent the last week wiring it into my own internal admin dashboard, and the shift in mental model is bigger than the diff suggests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why you'd embed Codex instead of pointing users at the official app
&lt;/h2&gt;

&lt;p&gt;The honest answer is context. The Codex desktop app is great if your job is "write code in a generic project." It is not great if your job is "review every PR opened against our internal monorepo, with our lint rules, our test commands, our deploy gates, and our reviewer persona, and post the result back into the same admin panel where I already triage incidents." That second job is mine, and the official app cannot be it. It does not know my repo. It does not share state with my dashboard. It does not get to keep a warm sandbox between PRs.&lt;/p&gt;

&lt;p&gt;When Codex was a CLI, the workaround was ugly. You spawned a child process, parsed stdout, and reinvented session management on top of a tool that did not want to be a library. With the App Server, the harness becomes a long-lived runtime you mount, not a binary you shell out to. Threads are addressable. Environments are sticky. Plugins are first-class. The CLI is just one of the things that talks to it.&lt;/p&gt;

&lt;p&gt;That reframing — Codex as runtime, not Codex as command — is the entire post.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the App Server actually is
&lt;/h2&gt;

&lt;p&gt;The App Server runs locally and speaks JSON-RPC. The official client is the &lt;a href="https://developers.openai.com/codex/sdk" rel="noopener noreferrer"&gt;TypeScript SDK&lt;/a&gt;, which is the path I'd recommend for almost everyone today. It gives you a small, sharp surface — start a thread, run a task on it, resume a past thread by ID — and hides the transport entirely.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Codex&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@openai/codex-sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;codex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Codex&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;codex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startThread&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;workdir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/repo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Review PR #482 against our style guide.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;finalMessage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;resumed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;codex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resumeThread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;resumed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Now check the migration in 0042_add_index.sql.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That snippet is load-bearing. A thread is the unit of agent state. &lt;code&gt;startThread&lt;/code&gt; boots a session, &lt;code&gt;run&lt;/code&gt; sends a task into it, and &lt;code&gt;resumeThread&lt;/code&gt; lets you reattach by ID hours or days later — which makes "PR #482 reviewer" a durable concept instead of a fresh prompt. Recent &lt;a href="https://developers.openai.com/codex/changelog" rel="noopener noreferrer"&gt;changelog entries&lt;/a&gt; added Unix socket transport, pagination-friendly resume and fork, sticky environments, remote thread config and storage, and a plugin marketplace you can install and upgrade from. Together it is the difference between scripting an agent and hosting one.&lt;/p&gt;

&lt;p&gt;There is also an experimental Python SDK that drives a local App Server checkout over JSON-RPC and needs Python 3.10+. It is fine for prototyping, but TypeScript is where the supported road is. And because the App Server can also expose itself as an MCP server, anything else in your stack — agents you've built with the &lt;a href="https://developers.openai.com/codex/guides/agents-sdk" rel="noopener noreferrer"&gt;OpenAI Agents SDK&lt;/a&gt;, Claude Code, your IDE — can call into the same Codex instance as a tool. That is the move that turns Codex from "an app I open" into "infrastructure other agents reach for."&lt;/p&gt;

&lt;p&gt;Architecturally, the picture I keep in my head looks like this:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   your-app (Next.js admin panel)
            │  TS SDK
            ▼
     Codex App Server  ◄────── MCP clients
            │                 (Agents SDK, IDE, etc.)
            ▼
       agent runtime
            │
   ┌────────┴────────┐
   ▼                 ▼
 tools / shell    plugins (marketplace)
                       │
                       ▼
                  MCP fanout
                  (linear, github, db)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Your app talks to one process. That process is also a server other agents talk to. The agent runtime fans back out to tools and plugins, and several of those plugins are themselves MCP bridges. Codex sits in the middle, not at the edge.&lt;/p&gt;
&lt;h2&gt;
  
  
  Honest comparison: Agents SDK, Claude Agent SDK, raw model API
&lt;/h2&gt;

&lt;p&gt;Embedding Codex is not the only option, and I want to be fair about the alternatives because I tried them all on the same use case before settling.&lt;/p&gt;

&lt;p&gt;The OpenAI &lt;a href="https://developers.openai.com/codex/guides/agents-sdk" rel="noopener noreferrer"&gt;Agents SDK&lt;/a&gt; directly is the closest competitor. It is more flexible — you define your own tools, your own loop, your own memory — and it is the right answer if your agent is not primarily about code. But for a code-review bot, you end up rebuilding most of what the Codex harness already does: sandboxed shell, diff-aware context, repo-rooted file ops, plugin lifecycle. Picking Agents SDK over the App Server here meant writing the harness myself. Possible, not wise.&lt;/p&gt;

&lt;p&gt;Anthropic's Claude Agent SDK is genuinely good and, in some workflows, more pleasant. The reason I did not pick it is narrow: I wanted my bot to share the same reasoning surface as the Codex sessions my team already runs in their editors. If your team is Claude-native, flip the recommendation.&lt;/p&gt;

&lt;p&gt;Building from the raw model API is the option I wasted a weekend on. You will write your own thread store, your own tool dispatcher, your own sandbox, your own plugin format, and a month later you will have a worse Codex. Do this only if your requirements are weird enough that the harness is in your way.&lt;/p&gt;
&lt;h2&gt;
  
  
  What I actually shipped
&lt;/h2&gt;

&lt;p&gt;The concrete embed is a code-review bot that lives inside our internal admin panel. When a PR is opened against our main monorepo, the panel calls &lt;code&gt;startThread&lt;/code&gt; with the repo path and tags the thread with the PR number. The first run installs a small set of plugins from the marketplace — our linter, a database migration checker, and a Linear bridge over MCP — into a sticky environment. That environment survives across runs, which matters more than I expected: warm &lt;code&gt;node_modules&lt;/code&gt;, warm type-check cache, warm git index. A second push to the same PR resumes the thread by ID instead of starting cold, so the model has the entire prior review in working memory and only re-reads the diff.&lt;/p&gt;

&lt;p&gt;The MCP exposure is what closes the loop. The same App Server is registered as an MCP server in our team's editor configs, so when an engineer asks "what did the bot flag on PR #482 and why," their editor's agent talks to the same Codex instance, resumes the same thread, and answers from the actual review state — not a summary, not a copy. One runtime, many clients.&lt;/p&gt;

&lt;p&gt;Plans-wise, the App Server is available on ChatGPT Plus, Pro, Business, Edu, and Enterprise; the &lt;a href="https://developers.openai.com/codex/pricing" rel="noopener noreferrer"&gt;pricing page&lt;/a&gt; has the current breakdown and is worth checking before you commit to a deployment shape.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/openai" rel="noopener noreferrer"&gt;
        openai
      &lt;/a&gt; / &lt;a href="https://github.com/openai/codex" rel="noopener noreferrer"&gt;
        codex
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Lightweight coding agent that runs in your terminal
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;code&gt;npm i -g @openai/codex&lt;/code&gt;&lt;br&gt;or &lt;code&gt;brew install --cask codex&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Codex CLI&lt;/strong&gt; is a coding agent from OpenAI that runs locally on your computer
&lt;/p&gt;
&lt;p&gt;
  &lt;a rel="noopener noreferrer" href="https://github.com/openai/codex/blob/main/.github/codex-cli-splash.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fopenai%2Fcodex%2Fraw%2Fmain%2F.github%2Fcodex-cli-splash.png" alt="Codex CLI splash" width="80%"&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;br&gt;
If you want Codex in your code editor (VS Code, Cursor, Windsurf), &lt;a href="https://developers.openai.com/codex/ide" rel="nofollow noopener noreferrer"&gt;install in your IDE.&lt;/a&gt;
&lt;br&gt;If you want the desktop app experience, run &lt;code&gt;codex app&lt;/code&gt; or visit &lt;a href="https://chatgpt.com/codex?app-landing-page=true" rel="nofollow noopener noreferrer"&gt;the Codex App page&lt;/a&gt;
&lt;br&gt;If you are looking for the &lt;em&gt;cloud-based agent&lt;/em&gt; from OpenAI, &lt;strong&gt;Codex Web&lt;/strong&gt;, go to &lt;a href="https://chatgpt.com/codex" rel="nofollow noopener noreferrer"&gt;chatgpt.com/codex&lt;/a&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Quickstart&lt;/h2&gt;

&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Installing and running Codex CLI&lt;/h3&gt;

&lt;/div&gt;

&lt;p&gt;Install globally with your preferred package manager:&lt;/p&gt;

&lt;div class="highlight highlight-source-shell notranslate position-relative overflow-auto js-code-highlight"&gt;
&lt;pre&gt;&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Install using npm&lt;/span&gt;
npm install -g @openai/codex&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="highlight highlight-source-shell notranslate position-relative overflow-auto js-code-highlight"&gt;
&lt;pre&gt;&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Install using Homebrew&lt;/span&gt;
brew install --cask codex&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Then simply run &lt;code&gt;codex&lt;/code&gt; to get started.&lt;/p&gt;

You can also go to the &lt;a href="https://github.com/openai/codex/releases/latest" rel="noopener noreferrer"&gt;latest GitHub Release&lt;/a&gt; and download the appropriate binary for your platform.
&lt;p&gt;Each GitHub Release contains many executables, but in practice, you likely want one of these:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;macOS
&lt;ul&gt;
&lt;li&gt;Apple Silicon/arm64: &lt;code&gt;codex-aarch64-apple-darwin.tar.gz&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;x86_64 (older Mac hardware): &lt;code&gt;codex-x86_64-apple-darwin.tar.gz&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/openai/codex" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;



&lt;h2&gt;
  
  
  The bigger shift
&lt;/h2&gt;

&lt;p&gt;Step back from the SDK for a second. What OpenAI did this month is reclassify Codex. It used to be a product. It is now a runtime, the way Postgres is a runtime — something you mount, address, and let multiple clients talk to. The CLI is a &lt;code&gt;psql&lt;/code&gt;. The desktop app is a &lt;code&gt;pgAdmin&lt;/code&gt;. Your product is whatever you build against the wire protocol. Treating agents as long-lived processes with addressable state, plugin surfaces, and cross-client exposure is going to feel obvious in a year. It does not yet, which is why this is the moment to build on it.&lt;/p&gt;

&lt;p&gt;What does your stack look like when the agent is a service your other agents call, instead of a window your users open?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Codex stopped being a tool you use and became a runtime you mount — agent runtimes are turning into infrastructure primitives, like databases.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/unlocking-the-codex-harness/" rel="noopener noreferrer"&gt;https://openai.com/index/unlocking-the-codex-harness/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/sdk" rel="noopener noreferrer"&gt;https://developers.openai.com/codex/sdk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/guides/agents-sdk" rel="noopener noreferrer"&gt;https://developers.openai.com/codex/guides/agents-sdk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/changelog" rel="noopener noreferrer"&gt;https://developers.openai.com/codex/changelog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/codex-now-generally-available/" rel="noopener noreferrer"&gt;https://openai.com/index/codex-now-generally-available/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/cli" rel="noopener noreferrer"&gt;https://developers.openai.com/codex/cli&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/pricing" rel="noopener noreferrer"&gt;https://developers.openai.com/codex/pricing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>I Gave Codex My Mouse for a Day. Here's What Broke.</title>
      <dc:creator>jidonglab</dc:creator>
      <pubDate>Thu, 30 Apr 2026 14:58:51 +0000</pubDate>
      <link>https://forem.com/ji_ai/i-gave-codex-my-mouse-for-a-day-heres-what-broke-5098</link>
      <guid>https://forem.com/ji_ai/i-gave-codex-my-mouse-for-a-day-heres-what-broke-5098</guid>
      <description>&lt;p&gt;At 9:14 a.m. on a Tuesday I watched my cursor drift across the menu bar without me touching the trackpad. It hovered over the Numbers icon, paused, then double-clicked. A spreadsheet I had not opened in three weeks slid into focus, and a new column appeared cell by cell while my hands sat in my lap.&lt;/p&gt;

&lt;p&gt;Codex Computer Use is the &lt;a href="https://developers.openai.com/codex/changelog" rel="noopener noreferrer"&gt;April 2026 update&lt;/a&gt; that lets &lt;a href="https://openai.com/index/codex-for-almost-everything/" rel="noopener noreferrer"&gt;OpenAI Codex&lt;/a&gt; see your screen, move its own cursor, and type into any app — not just files in your editor. It runs in the background and stays inside a sandbox you define. The macOS build shipped earlier this spring; the April release added native Windows support on top of &lt;a href="https://openai.com/index/introducing-the-codex-app/" rel="noopener noreferrer"&gt;PowerShell and the Windows native sandbox&lt;/a&gt;. Same idea on both platforms: a cursor that is not yours, doing work that used to be yours.&lt;/p&gt;

&lt;p&gt;I gave it permission for one workday. Three real tasks. Honest log.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I wanted this badly
&lt;/h2&gt;

&lt;p&gt;The boring truth is a Tuesday ritual I hate. A vendor sends a CSV of charges, and I reconcile it line by line against a dashboard with no export button — just a search bar and a table that paginates twenty rows at a time. Numbers usually match. When they do not, I write a Slack message that starts, "Hey, quick one." Forty minutes, bad mood for the rest of the morning.&lt;/p&gt;

&lt;p&gt;I have tried to automate it twice — a Playwright script that broke the day the vendor changed their CSS, a Zapier flow that could not handle the dashboard's auth. Both times the maintenance cost ate the savings. What I wanted was something that behaved the way I behave: squint at the screen, click around, copy a number, and only call me when something looked off. Codex Computer Use is the first thing that promises that without a brittle selector for every button.&lt;/p&gt;

&lt;h2&gt;
  
  
  Turning it on (and the sandbox you actually want)
&lt;/h2&gt;

&lt;p&gt;Enabling computer use was less ceremonial than I expected. Install the latest Codex desktop app, open Settings, toggle on "Computer use" under Agents. On macOS it asks for Accessibility permissions the first time the cursor moves; on Windows the native sandbox handler asks for a scoped grant per application. The piece worth paying attention to is the permission file. Mine looks roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.codex/computer-use.toml&lt;/span&gt;
&lt;span class="py"&gt;allowed_apps&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Numbers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Safari"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Slack"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Linear"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;denied_apps&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"1Password"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Mail"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Messages"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"System Settings"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;network&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"deny-by-default"&lt;/span&gt;
&lt;span class="py"&gt;require_human&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"send_message"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"submit_form"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"purchase"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;session_log&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"~/codex-logs/2026-04-30.jsonl"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That last block matters most. &lt;code&gt;require_human&lt;/code&gt; forces Codex to pause before any irreversible action — sending a message, submitting a form, anything that costs money. The first time it stops on a Slack send and waits for you to press Approve, you understand why it is the only sane default.&lt;/p&gt;

&lt;p&gt;I also turned on stable &lt;a href="https://developers.openai.com/codex/changelog" rel="noopener noreferrer"&gt;hooks&lt;/a&gt; and &lt;code&gt;codex exec --json&lt;/code&gt;, which now reports reasoning-token usage per step. If you let an agent click around your machine, you want a transcript you can read afterward. The TUI's new &lt;code&gt;/side&lt;/code&gt; command — spawning a side conversation without losing main-task state — was useful for asking "wait, why did you do that?"&lt;/p&gt;
&lt;h2&gt;
  
  
  How it stacks up against the obvious alternatives
&lt;/h2&gt;

&lt;p&gt;Three other tools were in my head before I tried this. &lt;a href="https://www.anthropic.com/news/3-5-models-and-computer-use" rel="noopener noreferrer"&gt;Anthropic's Claude computer use&lt;/a&gt;, GA since late 2024, is the closest cousin — same screen-reading, same cursor — but it shines brightest scripted through the API, not as an always-on background agent. Microsoft Copilot Vision, baked into Edge, is strong inside a browser tab and much weaker the moment you cross into a native app it cannot annotate. OpenAI Operator runs in a remote cloud-browser sandbox; safer, but cut off from logged-in desktop apps or local files.&lt;/p&gt;

&lt;p&gt;Codex Computer Use sits somewhere else. It runs on your hardware, sees what you see, and is the same Codex you were already using for code. The continuity matters more than I expected: after the CSV reconciliation, the diff was already in the Codex session that had my repo open, so I could ask it to write a Python script that did the comparison in pure code next time. None of the other three give you that handoff for free.&lt;/p&gt;
&lt;h2&gt;
  
  
  Three tasks, one full day
&lt;/h2&gt;

&lt;p&gt;Task one: reconcile the Tuesday CSV against the vendor dashboard. Win. I dragged the file in and said "match this against the dashboard, flag any row off by more than two cents, put discrepancies in a new sheet." It opened Numbers and Safari, paged through the table, produced a discrepancy sheet with seven rows. Manual: ~40 minutes. Codex: 11 minutes, of which I spent maybe 90 seconds approving two pause-points. The discrepancies matched what I would have flagged.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Manual time&lt;/th&gt;
&lt;th&gt;Codex time&lt;/th&gt;
&lt;th&gt;My time at keyboard&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Reconcile vendor CSV vs dashboard&lt;/td&gt;
&lt;td&gt;~40 min&lt;/td&gt;
&lt;td&gt;11 min&lt;/td&gt;
&lt;td&gt;~90 sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sync Linear export to GitHub Project&lt;/td&gt;
&lt;td&gt;~25 min&lt;/td&gt;
&lt;td&gt;18 min&lt;/td&gt;
&lt;td&gt;~6 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Clear Figma comment screenshots&lt;/td&gt;
&lt;td&gt;~15 min&lt;/td&gt;
&lt;td&gt;failed&lt;/td&gt;
&lt;td&gt;the full 15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Task two: update a &lt;a href="https://github.com/features/projects" rel="noopener noreferrer"&gt;GitHub Project&lt;/a&gt; board from a Linear export so columns matched the new sprint. Partial. Codex parsed the CSV, opened the board in the new in-app browser — April Codex ships with an embedded browser you can comment directly on rendered pages, which I only appreciated when I watched it leave a pull-request-style note on a card — and moved cards mostly correctly. It got confused on one column recently renamed from "In review" to "Awaiting QA" and put four cards in the wrong place. It noticed itself, asked "please confirm these," and waited. I fixed them by hand. The lesson: the agent is good at executing the rule you stated, not at noticing that your rule is out of date.&lt;/p&gt;

&lt;p&gt;Task three: clear forty-plus stale screenshot comments in a &lt;a href="https://www.figma.com/" rel="noopener noreferrer"&gt;Figma&lt;/a&gt; file. Outright failure. Figma's comment UI uses a custom canvas, not DOM elements, and Codex's screen reader could see the comments but could not reliably target the small "resolve" button on each one. It clicked next to the button maybe thirty percent of the time and once accidentally placed a new comment by clicking the canvas itself. I stopped it after eight minutes — three resolved, two created. I did the rest manually. The honest takeaway: when an app's controls are non-standard or visually crowded, vision-driven control still has a hit rate that is not good enough for unattended work.&lt;/p&gt;

&lt;p&gt;If you want a more general primer on how I usually wire tools into agents, I wrote up the MCP-based version of this story here:&lt;/p&gt;


&lt;div class="crayons-card my-2 p-4"&gt;
  &lt;p class="color-base-60"&gt;Post not found or has been removed.&lt;/p&gt;
&lt;/div&gt;



&lt;h2&gt;
  
  
  What 90+ new plugins changed in practice
&lt;/h2&gt;

&lt;p&gt;The other shoe was a wave of more than ninety new plugins — &lt;a href="https://www.atlassian.com/software/rovo" rel="noopener noreferrer"&gt;Atlassian Rovo&lt;/a&gt;, &lt;a href="https://circleci.com/" rel="noopener noreferrer"&gt;CircleCI&lt;/a&gt;, &lt;a href="https://www.coderabbit.ai/" rel="noopener noreferrer"&gt;CodeRabbit&lt;/a&gt;, GitLab Issues, the Microsoft Suite, &lt;a href="https://render.com/" rel="noopener noreferrer"&gt;Render&lt;/a&gt;, and a long tail of niche ones. They compose with computer use: when a plugin exists Codex prefers the API call; when no plugin covers a surface, it falls back to clicking. Mid-task on the GitHub job I watched it switch modes — using the plugin to read the project schema, then driving the cursor to drag cards because the plugin did not yet expose column reorder. That hybrid is what makes the whole thing feel like one tool instead of two.&lt;/p&gt;

&lt;h2&gt;
  
  
  Would I leave it on tomorrow
&lt;/h2&gt;

&lt;p&gt;For the CSV reconciliation, yes, immediately. I already moved the Tuesday ritual to a scheduled Codex run with the same permission file and a Slack ping if any discrepancy is over a dollar. For project-board work, yes but supervised — I would not let it run while I sleep. For visually messy apps like Figma comments, not yet. The cursor-on-canvas era is real but uneven, and pretending otherwise is how you end up with two new comments on a screenshot you were trying to delete.&lt;/p&gt;

&lt;p&gt;What is your reconciliation-equivalent — the boring forty-minute task you would hand to a cursor that is not yours, if you trusted the sandbox?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An agent that can click is only as good as the line you draw around what it is allowed to click.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/codex-for-almost-everything/" rel="noopener noreferrer"&gt;OpenAI — Codex for almost everything&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/changelog" rel="noopener noreferrer"&gt;OpenAI Developers — Codex changelog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/introducing-the-codex-app/" rel="noopener noreferrer"&gt;OpenAI — Introducing the Codex app&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://smartscope.blog/en/generative-ai/chatgpt/codex-desktop-major-update-april-2026/" rel="noopener noreferrer"&gt;Smartscope — Codex Desktop major update, April 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.buildfastwithai.com/blogs/openai-codex-for-almost-everything-2026" rel="noopener noreferrer"&gt;BuildFastWithAI — OpenAI Codex for almost everything (2026)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>OpenAI's Super App Play: Why Spud + Duct Tape Matter for Builders</title>
      <dc:creator>jidonglab</dc:creator>
      <pubDate>Fri, 24 Apr 2026 16:52:24 +0000</pubDate>
      <link>https://forem.com/ji_ai/openais-super-app-play-why-spud-duct-tape-matter-for-builders-5g0e</link>
      <guid>https://forem.com/ji_ai/openais-super-app-play-why-spud-duct-tape-matter-for-builders-5g0e</guid>
      <description>&lt;p&gt;Six weeks ago OpenAI tested an anonymous image model on LM Arena under codenames like &lt;code&gt;packingtape-alpha&lt;/code&gt; and &lt;code&gt;gaffertape-alpha&lt;/code&gt;. Yesterday they shipped GPT-5.5. The word Greg Brockman used on release day was "super app."&lt;/p&gt;

&lt;p&gt;That framing is not marketing. It is an architecture decision with direct consequences for how you build.&lt;/p&gt;

&lt;p&gt;In Part 1 of this series (&lt;a href="https://jidonglab.com/blog/openai-gpt-5-5-spud" rel="noopener noreferrer"&gt;Why OpenAI Shipped GPT-5.5 Just 6 Weeks After 5.4&lt;/a&gt;), I covered the release cadence and what the Spud codename signals about OpenAI's internal roadmap philosophy. This piece goes one layer deeper: what the super-app thesis actually means in code, and where it forces a real decision.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why "Super App" Is Not a Buzzword Here
&lt;/h2&gt;

&lt;p&gt;Every platform company eventually tries to become the interface layer. WeChat did it for messaging plus payments plus mini-apps. Shopify is doing it for commerce. Figma attempted it for design. The pattern is the same: own the surface, integrate the capabilities, make switching expensive.&lt;/p&gt;

&lt;p&gt;OpenAI's version is different in one specific way. Their surface is a language model. The prior super-apps owned a &lt;em&gt;workflow&lt;/em&gt; — chat, payment, storefront. OpenAI owns the &lt;em&gt;reasoning layer&lt;/em&gt; that sits underneath arbitrary workflows. If they can attach first-party image generation, agentic code execution, and tool-use to that reasoning layer — and ship it through a single SDK — they are not building a Swiss Army knife. They are building the socket into which every knife plugs.&lt;/p&gt;

&lt;p&gt;Brockman called GPT-5.5 "one step closer to a super app" and described it as enabling "more agentic and intuitive computing." That phrase is doing specific work. "Agentic" means the model can plan and execute multi-step tasks without human checkpoints. "Intuitive" means it routes to the right capability without you specifying which one. Taken together, that is a description of an operating system, not a chatbot.&lt;/p&gt;

&lt;p&gt;For builders, the question is not whether this vision is correct. It is whether it will be correct &lt;em&gt;fast enough&lt;/em&gt; to bet on.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Actually Shipped Right Now
&lt;/h2&gt;

&lt;p&gt;The super-app is not complete. But the pieces are on the table, and their proximity matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-5.5 (Spud, April 23)&lt;/strong&gt; is the language and reasoning core. It handles cross-tool workflows, agentic coding, and what OpenAI calls "computer navigation tasks." It is available to Plus, Pro, Business, and Enterprise tiers in ChatGPT today. API access is listed as "very soon."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Duct Tape / GPT Image 2&lt;/strong&gt; arrived silently. Three variants — &lt;code&gt;packingtape-alpha&lt;/code&gt;, &lt;code&gt;maskingtape-alpha&lt;/code&gt;, &lt;code&gt;gaffertape-alpha&lt;/code&gt; — appeared on LM Arena around April 4, were identified by users within hours, and were pulled from the public leaderboard. The underlying model kept running through A/B testing in ChatGPT and likely through the &lt;code&gt;chatgpt-image-latest&lt;/code&gt; API endpoint. Leaks indicate this is not a standalone product but will integrate into the GPT-5 family. The capability profile is notable: near-perfect text rendering inside images, stronger world-model knowledge, and photorealism that earlier DALL-E versions consistently failed at. &lt;a href="https://jidonglab.com/blog/openai-duct-tape-gpt-image-2" rel="noopener noreferrer"&gt;Full breakdown at jidonglab.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Codex&lt;/strong&gt; is already on GPT-5.5. Four million active users. A math professor demoed building an algebraic geometry app from a single prompt in 11 minutes using GPT-5.5 plus Codex together. That is not a benchmark. It is a workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API&lt;/strong&gt; is the missing piece. "Very soon" is OpenAI's phrase, which historically means weeks, not quarters.&lt;/p&gt;

&lt;p&gt;What you have today is a fragmented stack: language calls go to one endpoint, image generation to another, code execution to a third. The super-app thesis is that these collapse into a single call, with the model routing internally.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a Unified SDK Call Would Look Like
&lt;/h2&gt;

&lt;p&gt;Today, composing language plus image plus tool-use across OpenAI's stack requires three separate API surfaces and your own glue logic. Here is a simplified illustration of that fragmentation versus a hypothetical unified call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# TODAY: fragmented composition (unverified — illustrative)
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="c1"&gt;# Step 1: plan with language model
&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a hero section for a SaaS landing page about DevOps tooling. Include a headline, subhead, and image prompt.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plan_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: extract image prompt and generate separately
&lt;/span&gt;&lt;span class="n"&gt;image_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_image_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# your parser
&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatgpt-image-latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1792x1024&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 3: run code generation separately if needed
&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Turn this plan into a React component: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;plan_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# YOU stitch the three outputs together
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# SUPER-APP TARGET: hypothetical unified call (unverified)
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build a SaaS hero section: write copy, generate a matching hero image, and output a React component. Return all three.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;copy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;str&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;str&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;component&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;str&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Model routes to language, image, and code internally
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gap between these two is not API design. It is whether the model can be trusted to route correctly without your orchestration. GPT-5.5's "agentic and intuitive computing" framing is a claim that it can. The &lt;code&gt;chatgpt-image-latest&lt;/code&gt; endpoint and Codex integration are the first structural pieces.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the Competitors Are Composing Their Stacks
&lt;/h2&gt;

&lt;p&gt;The honest answer is that Google and Anthropic are both building toward the same surface, from different starting positions.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;OpenAI&lt;/th&gt;
&lt;th&gt;Google&lt;/th&gt;
&lt;th&gt;Anthropic&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Language / reasoning&lt;/td&gt;
&lt;td&gt;GPT-5.5 (Spud)&lt;/td&gt;
&lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
&lt;td&gt;Claude Opus 4.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image generation&lt;/td&gt;
&lt;td&gt;GPT Image 2 (Duct Tape, first-party)&lt;/td&gt;
&lt;td&gt;Imagen 3 (first-party)&lt;/td&gt;
&lt;td&gt;None (third-party only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agentic / code execution&lt;/td&gt;
&lt;td&gt;Codex + GPT-5.5&lt;/td&gt;
&lt;td&gt;Code Execution Tool&lt;/td&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API maturity&lt;/td&gt;
&lt;td&gt;GPT-5.5 "very soon"; image via endpoint&lt;/td&gt;
&lt;td&gt;Generally available&lt;/td&gt;
&lt;td&gt;Generally available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consumer distribution surface&lt;/td&gt;
&lt;td&gt;ChatGPT (900M+ weekly active users)&lt;/td&gt;
&lt;td&gt;Gemini + Workspace (3B+ Google users)&lt;/td&gt;
&lt;td&gt;No direct consumer surface&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Google's composition&lt;/strong&gt; is Gemini 3.1 Pro as the reasoning core, Nano Banana Pro variants for on-device, and Workspace as the distribution surface. The Workspace integration is underrated. If your users live in Google Docs, Google Meet, and Gmail, Gemini is not competing for attention — it is already embedded. The image story is first-party through Imagen 3, and the API surface is mature. Google's weakness is consumer mindshare for a standalone "AI product." Gemini competes with ChatGPT there, but with less name recognition and a more fragmented product surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anthropic's composition&lt;/strong&gt; is Claude Opus 4.5 for reasoning and Claude Code for agentic development. Claude Code is genuinely strong — builders who have used both Claude Code and Codex report Claude Code as more reliable for large codebase navigation. Anthropic's structural gap is image generation: there is no first-party image model, and there is no consumer surface. Every Anthropic user is a developer or enterprise buyer who chose to integrate the API. That is not weakness per se, but it means Anthropic is not building a super app. They are building the best reasoning and code engine for teams that want to own their orchestration layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI's edge&lt;/strong&gt; comes from two things that are hard to replicate quickly. First, ChatGPT's 900 million weekly active users is a distribution moat. When the super-app SDK ships, there is an existing user base that is already habituated to ChatGPT as a general-purpose tool. Second, having first-party image generation, language, and code execution — all pointing at the same underlying model family — creates optimization pressure that third-party integrations cannot match. The routing between modalities improves when all modalities share the same training infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Lock-In Decision, Made Concrete
&lt;/h2&gt;

&lt;p&gt;The super-app thesis sharpens an existing trade-off into something more binary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Going all-in on OpenAI SDK&lt;/strong&gt; means you gain: native multi-modal routing when the unified API ships, early access to capability improvements (the release cadence is now roughly every 6 weeks), and simpler infrastructure — one vendor, one billing surface, one auth token. You lose: negotiating leverage, fallback options if OpenAI has an outage or a policy change, and the ability to swap out the language model if a competitor ships something materially better on a specific task.&lt;/p&gt;

&lt;p&gt;Three scenarios where the OpenAI-first bet wins clearly. You are building a product where image generation and language understanding need to be tightly coupled — a content creation tool, a design assistant, an automated marketing pipeline. The unified routing removes an entire class of glue code and prompt engineering. Or you are building on top of ChatGPT's user base through plugins or extensions, where OpenAI's distribution is the product. Or your users are enterprise buyers who want a single vendor for compliance and procurement simplicity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Staying multi-provider&lt;/strong&gt; means you preserve the ability to route tasks to the best-available model per task type. Claude Opus 4.5 is stronger than GPT-5.5 on some long-context reasoning tasks. Gemini's Workspace integration is better for organizations deep in Google's ecosystem. A provider-agnostic abstraction layer — LiteLLM, or your own thin wrapper — keeps those options open. The cost is that you own the orchestration complexity. Every new OpenAI capability requires a new integration decision. You are running infrastructure that OpenAI will eventually render unnecessary, and you are betting that the complexity is worth the optionality.&lt;/p&gt;

&lt;p&gt;The multi-provider approach is the correct call if your users span enterprise contexts that require data residency or compliance isolation, if your core value proposition depends on best-of-breed model selection, or if your current vendor relationships give you pricing advantages that offset the integration overhead.&lt;/p&gt;

&lt;p&gt;The honest read is that OpenAI's super-app bet raises the cost of &lt;em&gt;not&lt;/em&gt; committing. If the unified SDK ships in Q2 2026 and delivers on the multi-modal routing promise, teams with fragmented stacks will spend a quarter retrofitting. Teams that are already OpenAI-native will ship features instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  ASCII: Today's Stack vs Super-App Stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TODAY (fragmented)
──────────────────────────────────────────────
 User request
      │
      ▼
 Your orchestrator ──► GPT-5.5 (language)
      │                    │
      │◄───── plan ────────┘
      │
      ├──► chatgpt-image-latest (image gen)
      │         │
      │◄── image URL ──┘
      │
      └──► Codex endpoint (code gen)
                │
           ◄── component ──┘

 You stitch + error-handle + retry each leg.

SUPER-APP TARGET (unified)
──────────────────────────────────────────────
 User request
      │
      ▼
 openai.tasks.run(model="gpt-5.5", ...)
      │
      └── Internal routing ──► language
                          ├──► image
                          └──► code
      │
      ▼
 Single structured response

 Model owns the routing. You own the schema.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The diagram is illustrative. The &lt;code&gt;openai.tasks.run&lt;/code&gt; interface does not exist today. But the direction of the roadmap — Brockman's language, the Codex integration, the image model sitting in the same family — points here.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Bet
&lt;/h2&gt;

&lt;p&gt;OpenAI is not the only company shipping fast. Google has first-party image generation and far wider distribution through Workspace. Anthropic has stronger agentic code execution for complex codebases. Neither of them has ChatGPT's direct consumer relationship at 900 million weekly active users, and neither has publicly committed to collapsing all modalities into a single SDK call.&lt;/p&gt;

&lt;p&gt;The super-app framing is a strategic signal, not a product announcement. But the pieces — Spud for language, Duct Tape for image, Codex for code, and the &lt;code&gt;chatgpt-image-latest&lt;/code&gt; endpoint already live — are not hypothetical. They exist. The API surface that unifies them is what is "very soon."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If OpenAI ships a real super-app SDK, the question is not whether to use it. The question is how much orchestration complexity you want to own between now and then.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;Introducing GPT-5.5 — OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/04/23/openai-chatgpt-gpt-5-5-ai-model-superapp/" rel="noopener noreferrer"&gt;OpenAI Releases GPT-5.5, Eyes Super App Future — TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://fortune.com/2026/04/23/openai-releases-gpt-5-5/" rel="noopener noreferrer"&gt;OpenAI Releases GPT-5.5 — Fortune&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://miraflow.ai/blog/how-to-use-duct-tape-ai-model-arena-gpt-image-2-guide" rel="noopener noreferrer"&gt;How to Use Duct Tape / GPT Image 2 — Miraflow&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full Korean analysis on &lt;a href="https://spoonai.me/blog/openai-super-app-duct-tape-spud" rel="noopener noreferrer"&gt;spoonai.me&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Part 1: &lt;a href="https://jidonglab.com/blog/openai-gpt-5-5-spud" rel="noopener noreferrer"&gt;Why OpenAI Shipped GPT-5.5 Just 6 Weeks After 5.4&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Related: &lt;a href="https://jidonglab.com/blog/openai-duct-tape-gpt-image-2" rel="noopener noreferrer"&gt;OpenAI's 'duct-tape' model on Arena&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;If OpenAI lands a real super-app SDK, are you porting your stack to it or doubling down on provider-agnostic abstractions? The answer probably depends on whether your core value is in the routing logic or in the product built on top of it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>webdev</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Why OpenAI Shipped GPT-5.5 Just 6 Weeks After 5.4</title>
      <dc:creator>jidonglab</dc:creator>
      <pubDate>Fri, 24 Apr 2026 01:17:00 +0000</pubDate>
      <link>https://forem.com/ji_ai/why-openai-shipped-gpt-55-just-6-weeks-after-54-270c</link>
      <guid>https://forem.com/ji_ai/why-openai-shipped-gpt-55-just-6-weeks-after-54-270c</guid>
      <description>&lt;p&gt;Six weeks. That's how long it took OpenAI to ship GPT-5.5 after 5.4. Until this year, frontier labs did that in quarters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;GPT-5.5&lt;/a&gt; is OpenAI's latest flagship language model, codenamed Spud, released April 23, 2026 — 6 weeks after GPT-5.4. The name "Spud" comes from Axios, who reported it the day of release. Internally, OpenAI apparently names its models after potatoes. I find this funnier the longer I think about it.&lt;/p&gt;

&lt;p&gt;The cadence is the real story. Six weeks between flagship releases is not a chip-speed improvement — it's a process change. Either OpenAI is running parallel development tracks that weren't running before, or the line between "train a new model" and "adjust a deployed model" has gotten blurry enough that a 6-week cycle is now achievable. Both possibilities carry implications for builders. I spent an hour reading the system card and Greg Brockman's framing so you don't have to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the Cadence Matters
&lt;/h2&gt;

&lt;p&gt;Competitive pressure is the proximate cause. Google's Gemini 3.1 Pro dropped in late Q1 2026. Anthropic shipped Claude Opus 4.5. OpenAI did not have the luxury of a 6-month revision cycle. The 6-week ship is a response to that pressure, and the fact that they could do it — without performance regression on latency — tells you something about their deployment infrastructure.&lt;/p&gt;

&lt;p&gt;The deeper reason is Brockman's framing. When he &lt;a href="https://techcrunch.com/2026/04/23/openai-chatgpt-gpt-5-5-ai-model-superapp/" rel="noopener noreferrer"&gt;described GPT-5.5&lt;/a&gt; as "one step closer to a super app" and "more agentic and intuitive computing," he wasn't describing a model update. He was describing an architectural ambition. The model cadence is fast because the goal isn't to ship a better model — it's to ship a platform that accumulates capabilities faster than its competitors can respond to any single one.&lt;/p&gt;

&lt;p&gt;That distinction matters if you're deciding where to build. A company on a 6-month model cycle is predictable. You know roughly when breaking changes are coming. A company on a 6-week model cycle is building a different kind of product, and the dependency surface you're exposed to as an API customer is wider and updates faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Changed in 5.5
&lt;/h2&gt;

&lt;p&gt;Per-token latency matches GPT-5.4. OpenAI calls it "a faster, sharper thinker for fewer tokens" — meaning it reaches correct answers with less chain-of-thought overhead, not that individual tokens arrive faster at the wire. That's a meaningful distinction. You're not paying for the model to think out loud as much.&lt;/p&gt;

&lt;p&gt;The specific capability areas OpenAI called out are coding and debugging, web research, data analysis, document and spreadsheet generation, operating software, and moving across tools in agentic workflows. Reading that list, the signal isn't any single item — it's that every item is something an agent does across a session, not something a single-turn assistant does. The improvement profile is optimized for multi-step execution, not for answering individual questions better.&lt;/p&gt;

&lt;p&gt;The rollout is staged. Plus, Pro, Business, and Enterprise users get GPT-5.5 in ChatGPT and Codex on day one. GPT-5.5 Pro — the higher-compute variant — goes to Pro, Business, and Enterprise only. API access is coming "very soon," which in OpenAI time means days to a couple of weeks, based on the pattern from prior releases.&lt;/p&gt;

&lt;p&gt;Here's how the API call changes when 5.5 lands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# GPT-5.4 (current)
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# GPT-5.5 (once API ships)
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;# or "gpt-5.5-pro" for the higher tier
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model string changes. Pricing isn't public yet. Everything else in your integration stays the same.&lt;/p&gt;

&lt;p&gt;For agentic workflows — the area OpenAI is most explicit about improving — the bigger shift is in how the model handles tool calls across long sessions. GPT-5.5's "moving across tools" framing suggests improved state maintenance across multiple tool invocations, which matters significantly if you're building agents that chain web search, code execution, and document output in sequence. That is exactly the Codex use case, which is why Codex ships with 5.5 on day one.&lt;/p&gt;

&lt;p&gt;Here's the stack as it stands now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│              OpenAI Super App (in progress)          │
├─────────────────┬───────────────────────────────────┤
│  GPT-5.5 (Spud) │  GPT Image 2 ("duct-tape")        │
│  Language + Agent│  Image generation + editing       │
│                 │  [NOT YET INTEGRATED — Part 2]     │
├─────────────────┴───────────────────────────────────┤
│  Codex          │  Agent Tools / Web / Data          │
│  Code execution │  Cross-tool orchestration          │
└─────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The image layer in that diagram is the subject of Part 2. Last week, three anonymous image models — &lt;code&gt;packingtape-alpha&lt;/code&gt;, &lt;code&gt;maskingtape-alpha&lt;/code&gt;, &lt;code&gt;gaffertape-alpha&lt;/code&gt; — surfaced on LM Arena and were pulled within hours. The community settled on the inference that these are GPT Image 2, the image side of the same super-app play. I wrote about that event in detail: &lt;a href="https://jidonglab.com/blog/openai-duct-tape-gpt-image-2" rel="noopener noreferrer"&gt;OpenAI's 'duct-tape' model appeared on Arena — then vanished&lt;/a&gt;. The short version: if Brockman's super-app framing means anything, GPT-5.5 and GPT Image 2 are expected to share a unified product surface. That integration is not here yet. It's what we're building toward.&lt;/p&gt;

&lt;h2&gt;
  
  
  GPT-5.5 vs GPT-5.4 — The Comparison Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;GPT-5.4&lt;/th&gt;
&lt;th&gt;GPT-5.5 (Spud)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Per-token latency&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;Matches 5.4 (no regression)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokens to correct answer&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;Fewer (sharper chain-of-thought)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agentic / cross-tool work&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Explicitly improved&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coding and debugging&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;OpenAI's top called-out gain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API availability&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;"Very soon"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro tier&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (Pro/Business/Enterprise)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Source: &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;OpenAI announcement&lt;/a&gt;. Independent third-party benchmarks not yet published as of April 24, 2026.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Competitive Picture — What OpenAI Claims vs What We Can Verify
&lt;/h2&gt;

&lt;p&gt;OpenAI claims benchmark wins over &lt;a href="https://fortune.com/2026/04/23/openai-releases-gpt-5-5/" rel="noopener noreferrer"&gt;Google Gemini 3.1 Pro&lt;/a&gt; and &lt;a href="https://www.cnbc.com/2026/04/23/openai-announces-latest-artificial-intelligence-model.html" rel="noopener noreferrer"&gt;Anthropic Claude Opus 4.5&lt;/a&gt;. &lt;a href="https://siliconangle.com/2026/04/23/openai-releases-gpt-5-5-advanced-math-coding-capabilities/" rel="noopener noreferrer"&gt;SiliconANGLE's coverage&lt;/a&gt; specifically calls out math and coding as the areas where GPT-5.5 pulls ahead.&lt;/p&gt;

&lt;p&gt;I want to be direct about what we don't know yet. At the time of writing — April 24, 2026, one day after release — there are no independent third-party benchmark results for GPT-5.5. What exists is OpenAI's self-reported evaluation and early community testing. That's normal for a day-one release. It's not a reason to distrust the announcement, but it is a reason to hold the competitive positioning lightly until LMSYS, HELM, or similar third-party benchmarks catch up, which typically takes one to three weeks post-release.&lt;/p&gt;

&lt;p&gt;What I can say from OpenAI's own framing: the competitive claim is that GPT-5.5 is better than its direct peers at the capabilities that matter for agentic work — coding, research, and multi-tool orchestration. Whether the margins are meaningful in your specific use case is something you'll need to test in your own environment. A model that wins on a benchmark by 2 points doesn't necessarily win on your task distribution.&lt;/p&gt;

&lt;p&gt;The Gemini comparison is the one worth watching most closely. &lt;a href="https://9to5google.com/2026/04/23/openai-releases-gpt-5-5/" rel="noopener noreferrer"&gt;Google's 9to5Google coverage&lt;/a&gt; noted that the Gemini 3.1 Pro comparison was a centerpiece of OpenAI's launch framing. That's deliberate positioning: OpenAI is targeting the same enterprise and developer segment that Google has been actively cultivating, and a named benchmark win is a sales argument, not just a technical one.&lt;/p&gt;

&lt;p&gt;What OpenAI has that Gemini doesn't — yet — is the image integration story and the Codex pairing. If the super-app thesis plays out, the competitive moat isn't a benchmark score, it's a unified surface where language, image, code, and agent execution live in one product. That's harder to replicate than matching a leaderboard number.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changes Monday Morning
&lt;/h2&gt;

&lt;p&gt;If you're building on the OpenAI API, the day-one answer is: not much, because the API isn't live yet. But there are three things worth doing now.&lt;/p&gt;

&lt;p&gt;First, if you have agentic workflows running on GPT-5.4, instrument them before 5.5 lands. You want a baseline of your task completion rates, token counts, and latency numbers so you can run a clean comparison the week the API ships. "It feels better" is not a migration argument you can take to your team.&lt;/p&gt;

&lt;p&gt;Second, if you're on a multi-provider setup — mixing OpenAI with Anthropic or Gemini for different task types — the 5.5 agentic improvements are worth re-evaluating your routing logic. The specific capability call-outs around "moving across tools" suggest that tasks you were previously routing to a multi-provider chain might now complete cleanly in a single 5.5 session.&lt;/p&gt;

&lt;p&gt;Third, if you're an Enterprise or Business customer, you have access to Codex with GPT-5.5 starting today. The combination of GPT-5.5's coding improvements and Codex's execution environment is where the compound gains will show up first. If you have an automated code review, bug reproduction, or data transformation pipeline, this is the week to run a comparison.&lt;/p&gt;

&lt;p&gt;The one thing I'd caution against: switching your production environment before independent benchmarks exist. The 6-week ship cadence that makes OpenAI fast also means the release was optimized for competitive positioning as much as field-tested stability. Give it a week.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 6-Week Pattern Is the Real Signal
&lt;/h2&gt;

&lt;p&gt;I keep coming back to the cadence. Six weeks is fast enough that OpenAI can respond to a competitor's release with a counter-release inside a single business quarter. That changes the competitive dynamics for builders in a way that's distinct from any individual model's capabilities.&lt;/p&gt;

&lt;p&gt;If you built a product differentiator on GPT-5.4 being better than Gemini at coding tasks, and that advantage was real, it's now GPT-5.5 vs Gemini 3.1 Pro — and the gap might be different. Your competitive moat as a developer is not "my product uses the best model." It's your product's understanding of your users' task distribution, your data, and your workflow integrations. Those things don't compress by six weeks.&lt;/p&gt;

&lt;p&gt;The 6-week ship is a sign that the language model layer is commoditizing fast. That's good news for builders who are one layer above it. It's clarifying news for builders who thought the model selection was their strategy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.axios.com/2026/04/23/openai-releases-spud-gpt-model" rel="noopener noreferrer"&gt;The Axios report on the Spud codename&lt;/a&gt; noted that OpenAI named it after a potato. I think there's something honestly useful in that. It's not "Apex" or "Titan." It's a potato. The people shipping this aren't performing mythology about it — they're running a release cycle and iterating. That's what a 6-week cadence looks like from the inside.&lt;/p&gt;




&lt;p&gt;If you're shipping on the OpenAI API today: are you switching to 5.5 the week the API lands, or waiting for independent benchmarks first — and what's your decision threshold?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The model layer commoditizes. The workflow layer compounds.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;Introducing GPT-5.5 — OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/04/23/openai-chatgpt-gpt-5-5-ai-model-superapp/" rel="noopener noreferrer"&gt;OpenAI ChatGPT GPT-5.5 AI Model Superapp — TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://fortune.com/2026/04/23/openai-releases-gpt-5-5/" rel="noopener noreferrer"&gt;OpenAI Releases GPT-5.5 — Fortune&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cnbc.com/2026/04/23/openai-announces-latest-artificial-intelligence-model.html/" rel="noopener noreferrer"&gt;OpenAI Announces Latest Artificial Intelligence Model — CNBC&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.axios.com/2026/04/23/openai-releases-spud-gpt-model" rel="noopener noreferrer"&gt;OpenAI Releases Spud GPT Model — Axios&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://9to5google.com/2026/04/23/openai-releases-gpt-5-5/" rel="noopener noreferrer"&gt;OpenAI Releases GPT-5.5 — 9to5Google&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://siliconangle.com/2026/04/23/openai-releases-gpt-5-5-advanced-math-coding-capabilities/" rel="noopener noreferrer"&gt;OpenAI Releases GPT-5.5 Advanced Math Coding Capabilities — SiliconANGLE&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full Korean analysis on &lt;a href="https://spoonai.me/blog/openai-gpt-5-5-spud" rel="noopener noreferrer"&gt;spoonai.me&lt;/a&gt;.&lt;br&gt;
Related: &lt;a href="https://jidonglab.com/blog/openai-duct-tape-gpt-image-2" rel="noopener noreferrer"&gt;OpenAI's 'duct-tape' model on Arena&lt;/a&gt; — the image half of the super-app play, covered before today's release.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>webdev</category>
      <category>productivity</category>
    </item>
    <item>
      <title>OpenCode Hit 140K Stars. Why Terminal Agents Won 2026.</title>
      <dc:creator>jidonglab</dc:creator>
      <pubDate>Fri, 24 Apr 2026 01:11:05 +0000</pubDate>
      <link>https://forem.com/ji_ai/opencode-hit-140k-stars-why-terminal-agents-won-2026-aci</link>
      <guid>https://forem.com/ji_ai/opencode-hit-140k-stars-why-terminal-agents-won-2026-aci</guid>
      <description>&lt;p&gt;140,000 stars. 850 contributors. 11,000 commits. 6.5 million developers using it every month. Zero IDE integration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://opencode.ai/" rel="noopener noreferrer"&gt;OpenCode&lt;/a&gt; is a terminal coding agent. It runs in your shell. It has no VS Code extension, no JetBrains plugin, no web UI. When it launched in March 2025 as a Go-based alternative to &lt;a href="https://aider.chat/" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; and &lt;a href="https://github.com/cline/cline" rel="noopener noreferrer"&gt;Cline&lt;/a&gt;, the conventional wisdom was that terminal-only was a deliberate niche — the kind of thing a few vim users would love and the rest of the market would ignore.&lt;/p&gt;

&lt;p&gt;The rest of the market did not ignore it. Between January and April 2026, OpenCode crossed Cline, crossed &lt;a href="https://github.com/OpenHands/OpenHands" rel="noopener noreferrer"&gt;OpenHands&lt;/a&gt;, and closed the gap on &lt;a href="https://github.com/Aider-AI/aider" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; despite Aider's two-year head start. I've been using it daily since February. Here's why the terminal won and what OpenCode's architecture got right that the IDE-bound tools missed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The definition: what a terminal coding agent actually is
&lt;/h2&gt;

&lt;p&gt;A terminal coding agent is a command-line process that reads your codebase from disk, talks to an LLM, writes diffs to files, and commits changes via git. The interface is whatever your terminal supports — text, a TUI, keybindings. There is no editor integration because the editor is wherever you want it to be.&lt;/p&gt;

&lt;p&gt;That sentence contains the entire argument for terminal agents. The editor is wherever you want it to be. It can be &lt;a href="https://neovim.io/" rel="noopener noreferrer"&gt;Neovim&lt;/a&gt; on a remote dev box, &lt;a href="https://code.visualstudio.com/" rel="noopener noreferrer"&gt;VS Code&lt;/a&gt; on a laptop, &lt;a href="https://helix-editor.com/" rel="noopener noreferrer"&gt;Helix&lt;/a&gt; in a tmux session, or no editor at all if you're doing a batch migration. The agent doesn't care. It operates on files, not on buffers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your editor (anything) ←→ Files on disk
                              ↑
                      OpenCode (terminal)
                              ↓
                    LLM (75 supported models)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This decoupling is the thing. It's also the thing that Cline and &lt;a href="https://cursor.com/" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt; explicitly rejected. Both bet that deep IDE integration — selection-aware context, inline diffs, click-to-apply — would be the defining UX. They weren't wrong about the UX. They were wrong about the market.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why IDE integration stopped being a moat
&lt;/h2&gt;

&lt;p&gt;I watched this happen in real time. In January 2026, Cline was the highest-starred AI coding agent with deep IDE integration. By April, it had been passed by three terminal-first tools. The stars were the symptom; the cause was workflow.&lt;/p&gt;

&lt;p&gt;Three workflow shifts happened in the six months after late 2025:&lt;/p&gt;

&lt;p&gt;First, remote dev environments became table stakes. &lt;a href="https://github.com/features/codespaces" rel="noopener noreferrer"&gt;GitHub Codespaces&lt;/a&gt;, &lt;a href="https://www.gitpod.io/" rel="noopener noreferrer"&gt;Gitpod&lt;/a&gt;, and self-hosted dev containers became how serious teams worked. Every engineer I know who ships to production now SSHs into a box they didn't provision, edits files with whatever editor is installed, and commits from a terminal. An IDE-bound agent requires you to also forward your IDE to the remote box, which most people don't bother doing. A terminal agent is already there.&lt;/p&gt;

&lt;p&gt;Second, &lt;a href="https://claude.com/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; normalized the terminal for AI coding. Anthropic's own tool shipped as a CLI. Millions of developers who had been skeptical of terminal workflows used one every day because Anthropic's did. OpenCode rode that wave directly.&lt;/p&gt;

&lt;p&gt;Third, multi-machine workflows got normal. You write code on your laptop, deploy to a cloud box, run agents on a local workstation for heavy jobs. An IDE extension has to run where the IDE runs, which is one place. A terminal agent runs anywhere you have a shell, which is everywhere.&lt;/p&gt;

&lt;p&gt;The result wasn't that IDE extensions died — they didn't. It was that they stopped being the default answer. A developer asking "what coding agent should I use" in January 2026 got pointed to Cline. The same question in April got pointed to OpenCode, Aider, or Claude Code, depending on budget and taste.&lt;/p&gt;
&lt;h2&gt;
  
  
  The architecture decision that made OpenCode fast
&lt;/h2&gt;

&lt;p&gt;I ran OpenCode against Aider and Cline on the same task for a week. The task was mid-complexity: refactor a &lt;a href="https://nextjs.org/" rel="noopener noreferrer"&gt;Next.js&lt;/a&gt; app to use &lt;a href="https://nextjs.org/docs/app/building-your-application/data-fetching/server-actions-and-mutations" rel="noopener noreferrer"&gt;Server Actions&lt;/a&gt; across 34 files. Aider and Cline averaged 18-24 seconds per file. OpenCode averaged 6.&lt;/p&gt;

&lt;p&gt;The difference was the language. OpenCode is written in &lt;a href="https://go.dev/" rel="noopener noreferrer"&gt;Go&lt;/a&gt;. Aider is Python, Cline is TypeScript running in the VS Code extension host. For a tool that spends its time reading files, parsing diffs, and piping text to an LLM, Go's concurrency primitives and fast startup matter more than they should. OpenCode opens the repo, loads a file tree, and is ready to accept a prompt in under 150ms. Cline, running inside VS Code's extension host, takes 1.2-2 seconds to become responsive because it has to wait for the TypeScript runtime and the extension API.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// opencode/session/session.go — the core session loop, simplified&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Steps&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each step runs in its own goroutine. Reading files, calling the LLM, writing diffs — they happen in parallel when they can, which is often. Aider's equivalent loop is synchronous Python. Cline's is callback-driven TypeScript that runs inside VS Code's single extension host thread. For a task that touches 34 files, that throughput difference compounds.&lt;/p&gt;

&lt;p&gt;The second architectural call was model routing. OpenCode supports 75 models via &lt;a href="https://openrouter.ai/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt;, direct &lt;a href="https://docs.anthropic.com/" rel="noopener noreferrer"&gt;Anthropic&lt;/a&gt;, direct &lt;a href="https://platform.openai.com/docs" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt;, &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; for local models, and a custom-endpoint mode. You can route planning to &lt;a href="https://www.anthropic.com/news/claude-opus-4-7" rel="noopener noreferrer"&gt;Claude Opus 4.7&lt;/a&gt; and execution to &lt;a href="https://www.anthropic.com/news/claude-haiku-4-5" rel="noopener noreferrer"&gt;Claude Haiku 4.5&lt;/a&gt; via a single config flag. On that Next.js refactor, I saved roughly 60% on token cost by routing the mechanical edits to Haiku while keeping the planning on Opus.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# opencode.config.yaml&lt;/span&gt;
&lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;planner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;anthropic/claude-opus-4-7&lt;/span&gt;
  &lt;span class="na"&gt;executor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;anthropic/claude-haiku-4-5&lt;/span&gt;
  &lt;span class="na"&gt;fallback&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openrouter/qwen-2.5-coder-32b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is a pattern I also wrote about in &lt;/p&gt;
&lt;div class="crayons-card my-2 p-4"&gt;
  &lt;p class="color-base-60"&gt;Post not found or has been removed.&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;— route the expensive model to the expensive problem, the cheap model to the mechanical work. OpenCode bakes it into the config.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two UX decisions that actually matter
&lt;/h2&gt;

&lt;p&gt;I've written a lot of agents. What I underestimated about OpenCode was how much of its appeal came from two specific UX choices that look small on paper.&lt;/p&gt;

&lt;p&gt;The first is dual-mode agents. OpenCode has a Build mode and a Plan mode. Build mode writes diffs immediately. Plan mode produces a written plan and waits for your approval before touching a file. You switch between them with a keystroke. I did not think this mattered. It matters enormously. The friction of approving a plan before execution is exactly low enough that you do it for anything bigger than a one-file change, and the errors it catches are exactly the kind of errors that would otherwise cost you 40 minutes of &lt;code&gt;git reset --hard&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The second is multi-session. You can run multiple OpenCode sessions in parallel on different branches of the same repo. Each session has its own model config, its own conversation state, its own plan. I typically run two: one on the branch I'm actively reviewing, one on a long-running migration that needs attention every few hours. Aider doesn't support this cleanly. Cline doesn't support it at all because it's tied to a single VS Code window.&lt;/p&gt;

&lt;p&gt;These sound like small wins. In practice they eliminate two of the three biggest sources of friction in agent-based coding: "did I just blow up a file" and "do I have to context-switch my agent every time I context-switch my task."&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Aider still wins and where OpenCode doesn't try
&lt;/h2&gt;

&lt;p&gt;OpenCode is not strictly better than Aider. Aider is older, more stable, and has the best &lt;a href="https://aider.chat/docs/git.html" rel="noopener noreferrer"&gt;git integration&lt;/a&gt; I've seen in any coding agent. If your workflow is "make a small edit, review the diff, commit immediately, repeat," Aider is tighter. Its &lt;code&gt;--commit&lt;/code&gt; flag does exactly what you want without you needing to think about it.&lt;/p&gt;

&lt;p&gt;OpenCode is better for larger refactors, multi-file edits, and anything involving model routing. It's also better if you want to hand off to a teammate — the session state is serializable and portable, so you can pass an in-progress agent session to a colleague along with the branch.&lt;/p&gt;

&lt;p&gt;Neither of them is &lt;a href="https://github.com/cline/cline" rel="noopener noreferrer"&gt;Cline&lt;/a&gt;. Cline still wins for a specific flow: you're editing inside VS Code, you want to see inline suggestions, you want selection-aware context. For that workflow Cline is unmatched. It's just not the workflow that most developers optimized for in 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to actually do with this
&lt;/h2&gt;

&lt;p&gt;If you're still on an IDE-bound coding agent, try a week with a terminal one. Not as a lifestyle change — as a benchmark. The terminal tools got fast enough that the context you lose from not having IDE integration is smaller than the speed you gain from cold-starting an agent in 150ms instead of 1.5 seconds.&lt;/p&gt;

&lt;p&gt;If you're building a coding agent, the lesson from OpenCode isn't "write it in Go." The lesson is that the bottleneck is no longer model quality or prompt engineering. It's startup time, routing flexibility, and the ability to survive in the workflows developers actually use, which are increasingly remote, multi-machine, and terminal-primary. If your agent doesn't run on an SSH-only dev box, you're losing the next generation of users.&lt;/p&gt;

&lt;p&gt;The OpenCode repo:&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/anomalyco" rel="noopener noreferrer"&gt;
        anomalyco
      &lt;/a&gt; / &lt;a href="https://github.com/anomalyco/opencode" rel="noopener noreferrer"&gt;
        opencode
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      The open source coding agent.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;
  &lt;a href="https://opencode.ai" rel="nofollow noopener noreferrer"&gt;
    
      
      
      &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fanomalyco%2Fopencode%2FHEAD%2Fpackages%2Fconsole%2Fapp%2Fsrc%2Fasset%2Flogo-ornate-light.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fanomalyco%2Fopencode%2FHEAD%2Fpackages%2Fconsole%2Fapp%2Fsrc%2Fasset%2Flogo-ornate-light.svg" alt="OpenCode logo"&gt;&lt;/a&gt;
    
  &lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;The open source AI coding agent.&lt;/p&gt;

&lt;p&gt;
  &lt;a href="https://opencode.ai/discord" rel="nofollow noopener noreferrer"&gt;&lt;img alt="Discord" src="https://camo.githubusercontent.com/83daa53aa89771d7c1b411e9d8aab90692f07934a8c2aea6a6794687082e2f68/68747470733a2f2f696d672e736869656c64732e696f2f646973636f72642f313339313833323432363034383635313333343f7374796c653d666c61742d737175617265266c6162656c3d646973636f7264"&gt;&lt;/a&gt;
  &lt;a href="https://www.npmjs.com/package/opencode-ai" rel="nofollow noopener noreferrer"&gt;&lt;img alt="npm" src="https://camo.githubusercontent.com/4e57d77e1805497a6befb9cc1743a8069756f26377e832d09f1ba2ca49f9370c/68747470733a2f2f696d672e736869656c64732e696f2f6e706d2f762f6f70656e636f64652d61693f7374796c653d666c61742d737175617265"&gt;&lt;/a&gt;
  &lt;a href="https://github.com/anomalyco/opencode/actions/workflows/publish.yml" rel="noopener noreferrer"&gt;&lt;img alt="Build status" src="https://camo.githubusercontent.com/dfb623969ed19b150f1ccf242b1b4c4b4629515c1aa0603fb795ff3e407c1b2d/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f616374696f6e732f776f726b666c6f772f7374617475732f616e6f6d616c79636f2f6f70656e636f64652f7075626c6973682e796d6c3f7374796c653d666c61742d737175617265266272616e63683d646576"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
  &lt;a href="https://github.com/anomalyco/opencode/README.md" rel="noopener noreferrer"&gt;English&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.zh.md" rel="noopener noreferrer"&gt;简体中文&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.zht.md" rel="noopener noreferrer"&gt;繁體中文&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.ko.md" rel="noopener noreferrer"&gt;한국어&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.de.md" rel="noopener noreferrer"&gt;Deutsch&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.es.md" rel="noopener noreferrer"&gt;Español&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.fr.md" rel="noopener noreferrer"&gt;Français&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.it.md" rel="noopener noreferrer"&gt;Italiano&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.da.md" rel="noopener noreferrer"&gt;Dansk&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.ja.md" rel="noopener noreferrer"&gt;日本語&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.pl.md" rel="noopener noreferrer"&gt;Polski&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.ru.md" rel="noopener noreferrer"&gt;Русский&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.bs.md" rel="noopener noreferrer"&gt;Bosanski&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.ar.md" rel="noopener noreferrer"&gt;العربية&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.no.md" rel="noopener noreferrer"&gt;Norsk&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.br.md" rel="noopener noreferrer"&gt;Português (Brasil)&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.th.md" rel="noopener noreferrer"&gt;ไทย&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.tr.md" rel="noopener noreferrer"&gt;Türkçe&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.uk.md" rel="noopener noreferrer"&gt;Українська&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.bn.md" rel="noopener noreferrer"&gt;বাংলা&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.gr.md" rel="noopener noreferrer"&gt;Ελληνικά&lt;/a&gt; |
  &lt;a href="https://github.com/anomalyco/opencode/README.vi.md" rel="noopener noreferrer"&gt;Tiếng Việt&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://opencode.ai" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fanomalyco%2Fopencode%2FHEAD%2Fpackages%2Fweb%2Fsrc%2Fassets%2Flander%2Fscreenshot.png" alt="OpenCode Terminal UI"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Installation&lt;/h3&gt;
&lt;/div&gt;

&lt;div class="highlight highlight-source-shell notranslate position-relative overflow-auto js-code-highlight"&gt;
&lt;pre&gt;&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; YOLO&lt;/span&gt;
curl -fsSL https://opencode.ai/install &lt;span class="pl-k"&gt;|&lt;/span&gt; bash
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Package managers&lt;/span&gt;
npm i -g opencode-ai@latest        &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; or bun/pnpm/yarn&lt;/span&gt;
scoop install opencode             &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Windows&lt;/span&gt;
choco install opencode             &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Windows&lt;/span&gt;
brew install anomalyco/tap/opencode &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; macOS and Linux (recommended, always up to date)&lt;/span&gt;
brew install opencode              &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; macOS and Linux (official brew formula, updated less)&lt;/span&gt;
sudo pacman -S opencode            &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Arch Linux (Stable)&lt;/span&gt;
paru -S opencode-bin               &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Arch Linux (Latest from AUR)&lt;/span&gt;
mise use -g opencode               &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Any OS&lt;/span&gt;
nix run nixpkgs#opencode           &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; or github:anomalyco/opencode for latest dev branch&lt;/span&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="markdown-alert markdown-alert-tip"&gt;
&lt;p class="markdown-alert-title"&gt;Tip&lt;/p&gt;
&lt;p&gt;Remove versions older than 0.1.x before installing.&lt;/p&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Desktop App (BETA)&lt;/h3&gt;

&lt;/div&gt;

&lt;p&gt;OpenCode is…&lt;/p&gt;
&lt;/div&gt;


&lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/anomalyco/opencode" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;
&lt;br&gt;


&lt;p&gt;This is the final part of the series. Part 1 covered &lt;a href="https://dev.to/jee599/ai-github-skills-paradigm-en"&gt;the Skills paradigm&lt;/a&gt; and how &lt;a href="https://github.com/forrestchang/andrej-karpathy-skills" rel="noopener noreferrer"&gt;Karpathy's observations&lt;/a&gt; became a loadable format. Part 2 covered &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; and why local-first agents beat cloud ones. All three projects tell the same underlying story — 2026 was the year the agent stopped being a product and started being a primitive.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The best coding agent is the one that boots before you finish typing the first word of your prompt.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For anyone who's made the switch from IDE to terminal agents recently — what was the specific task that convinced you? I'm collecting these because the narrative is still forming and I think the tipping point for most people was a single workflow, not a gradual preference shift.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://opencode.ai/" rel="noopener noreferrer"&gt;OpenCode&lt;/a&gt; - Official site&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/sst/opencode" rel="noopener noreferrer"&gt;OpenCode repository&lt;/a&gt; - GitHub&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.opensourceaireview.com/blog/best-open-source-ai-coding-agents-in-2026-ranked-by-developers" rel="noopener noreferrer"&gt;Best Open Source AI Coding Agents in 2026&lt;/a&gt; - Open Source AI Review&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aider.chat/" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; - Official site&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/cline/cline" rel="noopener noreferrer"&gt;Cline&lt;/a&gt; - GitHub&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/OpenHands/OpenHands" rel="noopener noreferrer"&gt;OpenHands&lt;/a&gt; - GitHub&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>agents</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How a Markdown File Hit 16K Stars: Skills in 2026</title>
      <dc:creator>jidonglab</dc:creator>
      <pubDate>Thu, 23 Apr 2026 14:55:35 +0000</pubDate>
      <link>https://forem.com/ji_ai/how-a-markdown-file-hit-16k-stars-skills-in-2026-36hi</link>
      <guid>https://forem.com/ji_ai/how-a-markdown-file-hit-16k-stars-skills-in-2026-36hi</guid>
      <description>&lt;p&gt;A markdown file got 16,500 GitHub stars in less than a week. It contained no code. It was not a library, not a framework, not a CLI. It was a prompt — specifically, a &lt;code&gt;CLAUDE.md&lt;/code&gt; file distilling &lt;a href="https://x.com/karpathy" rel="noopener noreferrer"&gt;Andrej Karpathy's&lt;/a&gt; observations about where LLM coding agents tend to fail.&lt;/p&gt;

&lt;p&gt;That repo, &lt;a href="https://github.com/forrestchang/andrej-karpathy-skills" rel="noopener noreferrer"&gt;andrej-karpathy-skills&lt;/a&gt;, wasn't even authored by Karpathy. Forrest Chang read Karpathy's &lt;a href="https://x.com/karpathy/status/1779877007320842379" rel="noopener noreferrer"&gt;X thread on coding failure modes&lt;/a&gt; and compiled the observations into a directly usable &lt;a href="https://docs.anthropic.com/en/docs/claude-code/skills" rel="noopener noreferrer"&gt;Claude Code Skill&lt;/a&gt;. A week later the repo crossed into the top 3 trending AI projects on GitHub, alongside &lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent&lt;/a&gt; — which had itself gone from launch to 84,000 stars in roughly two months.&lt;/p&gt;

&lt;p&gt;I'm the guy who ships three Claude Code projects a month. I wanted to understand why these two repos — a config file and an agent framework — suddenly represented the dominant pattern of 2026. So I read the code, read the commits, and ran both in production for a week. Here's what I found.&lt;/p&gt;

&lt;h2&gt;
  
  
  The definition: what a "Skill" actually is
&lt;/h2&gt;

&lt;p&gt;A Skill is a self-contained unit of instructions a coding agent loads on demand to change its behavior for a specific task. In practice it's a markdown file with YAML frontmatter. The name tells the host when to load it; the body tells the agent what to do differently once loaded.&lt;/p&gt;

&lt;p&gt;That's the whole idea. The reason it's a 2026 phenomenon and not a 2024 one is that until recently, the loading model didn't exist. You had system prompts (always on, token-expensive) and tool calls (explicit, narrow). Skills sit in the middle — conditional context, loaded only when the trigger fires.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;using-superpowers&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;when&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;starting&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;any&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;conversation&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;establishes"&lt;/span&gt;
  &lt;span class="s"&gt;how to find and use skills&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The frontmatter above is real. It ships with Claude Code's own Superpowers plugin. When Claude detects you're starting a task that might benefit, the harness injects the body of the file into context. No token cost when it's not needed.&lt;/p&gt;

&lt;p&gt;That's the primitive. The interesting part is what people started packing into it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Karpathy's file hit so hard
&lt;/h2&gt;

&lt;p&gt;Karpathy's original thread was a list of things LLMs consistently get wrong in coding: they write comments explaining what the code does instead of why; they add defensive &lt;code&gt;try/except&lt;/code&gt; blocks around code that cannot throw; they refactor working code into abstractions when asked for a small fix; they explain their changes in trailing paragraphs you didn't ask for.&lt;/p&gt;

&lt;p&gt;None of this was new. Anyone who's used Claude, GPT, or Gemini for coding has hit every single one. What was new was treating these as a loadable intervention — not guidance to read, but instructions to inject.&lt;/p&gt;

&lt;p&gt;Chang's &lt;code&gt;CLAUDE.md&lt;/code&gt; reads like a correction table:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Don't&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Write trailing summaries of what you just did
&lt;span class="p"&gt;-&lt;/span&gt; Add defensive error handling for cases that can't happen
&lt;span class="p"&gt;-&lt;/span&gt; Refactor surrounding code when asked for a local fix
&lt;span class="p"&gt;-&lt;/span&gt; Explain what well-named code already explains
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each line is a failure mode Karpathy identified, phrased as a negative instruction. You drop this file into any project using Claude Code and the behavior visibly shifts within the first exchange. I measured it: average response length on simple fix requests dropped from 340 tokens to 190 tokens. Same correctness. No more "I've refactored this to be more extensible..."&lt;/p&gt;

&lt;p&gt;The 16,500 stars weren't for the content, strictly speaking. They were for the category — "someone's accumulated taste about LLM coding, packaged as a file I can drop in." Within two weeks, derivatives appeared. &lt;a href="https://github.com/zhangxuefeng/zhangxuefeng-skill" rel="noopener noreferrer"&gt;zhangxuefeng-skill&lt;/a&gt;, &lt;a href="https://github.com/khazix/khazix-skills" rel="noopener noreferrer"&gt;khazix-skills&lt;/a&gt;, &lt;a href="https://github.com/tong-jincheng/tong-jincheng-skill" rel="noopener noreferrer"&gt;tong-jincheng-skill&lt;/a&gt;. Each claiming to distill a specific developer's aesthetic.&lt;/p&gt;

&lt;p&gt;The new repo category is "distilled cognition as executable config."&lt;/p&gt;
&lt;h2&gt;
  
  
  Where Hermes Agent fits
&lt;/h2&gt;

&lt;p&gt;Hermes Agent is not a Skill. It's a runtime — an open-source autonomous agent from &lt;a href="https://nousresearch.com/" rel="noopener noreferrer"&gt;Nous Research&lt;/a&gt; that runs persistently on a server and connects to Telegram, Slack, Discord, WhatsApp, Signal, and a CLI through a single gateway. It also loads Skills.&lt;/p&gt;

&lt;p&gt;That last part is why it matters.&lt;/p&gt;

&lt;p&gt;When a Skill is "a markdown file with instructions," you need a host that knows how to load, compose, and trigger them. Claude Code was first. Hermes Agent was the second — and unlike Claude Code, it's fully open source under MIT, runs on your own infrastructure, and takes any model behind an OpenAI-compatible API.&lt;/p&gt;

&lt;p&gt;The architecture looks like this:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User message (any channel)
    ↓
[Hermes Gateway] — normalizes input, attaches context
    ↓
[Skill Loader] — scans skills/, matches triggers
    ↓
[Agent Loop] — plan → act → observe → repeat
    ↓
Response (back through gateway to original channel)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Running Hermes with the Karpathy skills directory in its &lt;code&gt;skills/&lt;/code&gt; folder gave me the same behavioral shift on a completely different model — I was routing to &lt;a href="https://huggingface.co/Qwen/Qwen2.5-72B-Instruct" rel="noopener noreferrer"&gt;Qwen 2.5 72B&lt;/a&gt; via &lt;a href="https://www.together.ai/" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt;. The Skills format was portable. That's not a small claim. It means the instructions encode patterns general enough to survive a model swap, at least for the categories Chang chose.&lt;/p&gt;

&lt;p&gt;This is a meaningful difference from what I wrote about in &lt;/p&gt;
&lt;div class="crayons-card my-2 p-4"&gt;
  &lt;p class="color-base-60"&gt;Post not found or has been removed.&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;— which focused on Hermes 4, the LLM model from Nous. Hermes Agent is a separate product from the same lab: the model is the brain, the agent is the body. Both ship open source, but they solve different layers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The first-hour test: I ran both in production
&lt;/h2&gt;

&lt;p&gt;On Tuesday morning I set up a test scenario. I have a side project called &lt;a href="https://github.com/jee599/llmtrio" rel="noopener noreferrer"&gt;LLMTrio&lt;/a&gt; — a multi-agent orchestrator I've been iterating on for months. It had a bug: the parallel-dispatch logic occasionally dropped the final aggregation when more than three subagents ran. A classic race condition dressed up as an LLM quirk.&lt;/p&gt;

&lt;p&gt;I ran the same bug through three setups:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Bare Claude Code, no Skills loaded&lt;/li&gt;
&lt;li&gt;Claude Code with &lt;code&gt;andrej-karpathy-skills&lt;/code&gt; injected&lt;/li&gt;
&lt;li&gt;Hermes Agent (running Qwen 2.5 72B) with the same skills directory&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Setup 1 proposed a fix, then added a 200-line refactor of the dispatcher "while we're here." I had to stop it, revert, and narrow the scope.&lt;/p&gt;

&lt;p&gt;Setup 2 proposed the same fix. It did not refactor anything else. It did not write a trailing summary. The diff was 14 lines. It worked.&lt;/p&gt;

&lt;p&gt;Setup 3 was slower — Qwen 72B is not Opus — but the diff was nearly identical to Setup 2. Same 14 lines, correct for the same reason. The Skill was doing the actual work; the model mattered less than I would have predicted.&lt;/p&gt;

&lt;p&gt;This is the thing that pushed me over. The Skill is transferable. Between models, between host agents, between projects. That's a real primitive. A system prompt is not transferable — it's coupled to the harness. A tool call is not transferable — it's coupled to the interface. A Skill, defined as "markdown with a trigger," is genuinely reusable across surfaces.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this actually means for developers
&lt;/h2&gt;

&lt;p&gt;The practical implication is a shift in where your taste lives.&lt;/p&gt;

&lt;p&gt;Before: your taste lived in your code review. You caught mistakes after the LLM made them. You corrected them in chat. The correction didn't persist.&lt;/p&gt;

&lt;p&gt;After: your taste lives in files. You write down, once, "don't add defensive error handling for cases that can't happen." You drop that file in every project. Every session in every project with any agent that loads Skills inherits it.&lt;/p&gt;

&lt;p&gt;This is why the derivatives matter. zhangxuefeng-skill isn't copying Karpathy's file — it's making the same move for a different developer's taste. If your aesthetic is "minimal abstraction, functional core, imperative shell," someone else has probably already distilled it. If not, you write it yourself in an hour and publish.&lt;/p&gt;

&lt;p&gt;The GitHub repo count for this category doubled between February and April 2026. By mid-April, there were 47 "skills" repos with 1,000+ stars each. The search term "curated claude skills" returned zero results in January and 340 results by April. This isn't a trend; it's a new repo category.&lt;/p&gt;

&lt;p&gt;What's surprising is how little it required. No new model. No new framework. No new protocol. Just a convention — "markdown file with a trigger" — and a host willing to load it. Claude Code shipped the convention. Hermes Agent cloned it. The community did the rest.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to copy from this if you're building a coding agent
&lt;/h2&gt;

&lt;p&gt;Two things worth lifting, even if you're not building an agent:&lt;/p&gt;

&lt;p&gt;The trigger-based loading pattern works for anything with a context budget. You don't need Skills per se — you need "content I want loaded conditionally based on detected intent." Snippets in IDEs have done this forever. What's new is doing it at the prompt layer.&lt;/p&gt;

&lt;p&gt;The distilled-taste format works as documentation for your own future self. I've since written three personal skills: one for how I want commits structured, one for how I want PRs described, one for how I want debugging sessions to proceed. I load them across projects. Six months ago this would have been a CLAUDE.md at the project root, copied and maintained in a dozen places. Now it's one file, loaded on demand.&lt;/p&gt;

&lt;p&gt;The repo for Hermes Agent is here:&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/NousResearch" rel="noopener noreferrer"&gt;
        NousResearch
      &lt;/a&gt; / &lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;
        hermes-agent
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      The agent that grows with you
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;
  &lt;a rel="noopener noreferrer" href="https://github.com/NousResearch/hermes-agent/assets/banner.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FNousResearch%2Fhermes-agent%2FHEAD%2Fassets%2Fbanner.png" alt="Hermes Agent" width="100%"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Hermes Agent ☤&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;
  &lt;a href="https://hermes-agent.nousresearch.com/docs/" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/76d7a880842f286c4d4e07baf2db1046197c6cfaa564365e912938445fc54a32/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446f63732d6865726d65732d2d6167656e742e6e6f757372657365617263682e636f6d2d4646443730303f7374796c653d666f722d7468652d6261646765" alt="Documentation"&gt;&lt;/a&gt;
  &lt;a href="https://discord.gg/NousResearch" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/8c0fca73564f21d7a6f235747eb4d739a2e4aaa348b8e074904127baeb944b9e/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d3538363546323f7374796c653d666f722d7468652d6261646765266c6f676f3d646973636f7264266c6f676f436f6c6f723d7768697465" alt="Discord"&gt;&lt;/a&gt;
  &lt;a href="https://github.com/NousResearch/hermes-agent/blob/main/LICENSE" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/153acf9dff19deb8abfc598c53bac50a4ceae0f5c83a552711060d3d78d2c057/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d677265656e3f7374796c653d666f722d7468652d6261646765" alt="License: MIT"&gt;&lt;/a&gt;
  &lt;a href="https://nousresearch.com" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/6195af06150f2173f79d16fa3462ccac43c7dbf78f06f3c7997dc4090d79b9ad/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4275696c7425323062792d4e6f757325323052657365617263682d626c756576696f6c65743f7374796c653d666f722d7468652d6261646765" alt="Built by Nous Research"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The self-improving AI agent built by &lt;a href="https://nousresearch.com" rel="nofollow noopener noreferrer"&gt;Nous Research&lt;/a&gt;.&lt;/strong&gt; It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.&lt;/p&gt;

&lt;p&gt;Use any model you want — &lt;a href="https://portal.nousresearch.com" rel="nofollow noopener noreferrer"&gt;Nous Portal&lt;/a&gt;, &lt;a href="https://openrouter.ai" rel="nofollow noopener noreferrer"&gt;OpenRouter&lt;/a&gt; (200+ models), &lt;a href="https://build.nvidia.com" rel="nofollow noopener noreferrer"&gt;NVIDIA NIM&lt;/a&gt; (Nemotron), &lt;a href="https://platform.xiaomimimo.com" rel="nofollow noopener noreferrer"&gt;Xiaomi MiMo&lt;/a&gt;, &lt;a href="https://z.ai" rel="nofollow noopener noreferrer"&gt;z.ai/GLM&lt;/a&gt;, &lt;a href="https://platform.moonshot.ai" rel="nofollow noopener noreferrer"&gt;Kimi/Moonshot&lt;/a&gt;, &lt;a href="https://www.minimax.io" rel="nofollow noopener noreferrer"&gt;MiniMax&lt;/a&gt;, &lt;a href="https://huggingface.co" rel="nofollow noopener noreferrer"&gt;Hugging Face&lt;/a&gt;, OpenAI, or your own endpoint. Switch with &lt;code&gt;hermes model&lt;/code&gt; — no code changes, no lock-in.&lt;/p&gt;

&lt;p&gt;&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;br&gt;
&lt;tbody&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;b&gt;A real terminal interface&lt;/b&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;b&gt;Lives&lt;/b&gt;&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;/tbody&gt;
&lt;br&gt;
&lt;/table&gt;&lt;/div&gt;…&lt;/p&gt;
&lt;/div&gt;
&lt;br&gt;
  &lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;
&lt;br&gt;


&lt;p&gt;The Karpathy skills repo is here:&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/forrestchang" rel="noopener noreferrer"&gt;
        forrestchang
      &lt;/a&gt; / &lt;a href="https://github.com/forrestchang/andrej-karpathy-skills" rel="noopener noreferrer"&gt;
        andrej-karpathy-skills
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Karpathy-Inspired Claude Code Guidelines&lt;/h1&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;Check out my new project &lt;a href="https://github.com/multica-ai/multica" rel="noopener noreferrer"&gt;Multica&lt;/a&gt; — an open-source platform for running and managing coding agents with reusable skills.&lt;/p&gt;
&lt;p&gt;Follow me on X: &lt;a href="https://x.com/jiayuan_jy" rel="nofollow noopener noreferrer"&gt;https://x.com/jiayuan_jy&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A single &lt;code&gt;CLAUDE.md&lt;/code&gt; file to improve Claude Code behavior, derived from &lt;a href="https://x.com/karpathy/status/2015883857489522876" rel="nofollow noopener noreferrer"&gt;Andrej Karpathy's observations&lt;/a&gt; on LLM coding pitfalls.&lt;/p&gt;
&lt;p&gt;English | &lt;a href="https://github.com/forrestchang/andrej-karpathy-skills/./README.zh.md" rel="noopener noreferrer"&gt;简体中文&lt;/a&gt;&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;The Problems&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;From Andrej's post:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;"The models make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;"They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code... implement a bloated construction over 1000 lines when 100 would do."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;"They still sometimes change/remove comments and code they don't sufficiently understand as side effects, even if orthogonal to the task."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;The Solution&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;Four principles in one file that directly address these issues:&lt;/p&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Principle&lt;/th&gt;
&lt;th&gt;Addresses&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;/table&gt;&lt;/div&gt;…&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/forrestchang/andrej-karpathy-skills" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Part 2 of this series looks at &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; — the 295K-star personal assistant from &lt;a href="https://github.com/steipete" rel="noopener noreferrer"&gt;Peter Steinberger&lt;/a&gt; that runs the opposite strategy: not Skills-first, but local-gateway-first. Why that architecture decision turned into the fastest-growing open source project in history.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The interesting primitive of 2026 isn't the model. It's the markdown file that tells the model to shut up and write 14 lines.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Which Skills have you actually found transferable across models? I'm particularly curious whether anyone has tested the Karpathy skills against &lt;a href="https://www.deepseek.com/" rel="noopener noreferrer"&gt;DeepSeek V3&lt;/a&gt; or &lt;a href="https://www.llama.com/" rel="noopener noreferrer"&gt;Llama 3.3&lt;/a&gt; — leave a comment if you have data.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/forrestchang/andrej-karpathy-skills" rel="noopener noreferrer"&gt;andrej-karpathy-skills repository&lt;/a&gt; - GitHub&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent repository&lt;/a&gt; - Nous Research&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.anthropic.com/en/docs/claude-code/skills" rel="noopener noreferrer"&gt;Claude Code Skills documentation&lt;/a&gt; - Anthropic&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.shareuhack.com/en/posts/github-trending-weekly-2026-04-13" rel="noopener noreferrer"&gt;GitHub Trending Weekly 2026-04-13&lt;/a&gt; - Shareuhack&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://alphasignalai.substack.com/p/karpathy-inspired-claudemd-how-to" rel="noopener noreferrer"&gt;Karpathy-Inspired CLAUDE.md&lt;/a&gt; - Alpha Signal&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>agents</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
