<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Hassann</title>
    <description>The latest articles on Forem by Hassann (@hassann).</description>
    <link>https://forem.com/hassann</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3890506%2F89a141f2-4995-48b3-b5f2-e00ba5055afb.png</url>
      <title>Forem: Hassann</title>
      <link>https://forem.com/hassann</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/hassann"/>
    <language>en</language>
    <item>
      <title>API Design Patterns from the World's Largest Prediction Market: Lessons from Polymarket</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Sat, 09 May 2026 09:50:04 +0000</pubDate>
      <link>https://forem.com/hassann/api-design-patterns-from-the-worlds-largest-prediction-market-lessons-from-polymarket-1chj</link>
      <guid>https://forem.com/hassann/api-design-patterns-from-the-worlds-largest-prediction-market-lessons-from-polymarket-1chj</guid>
      <description>&lt;p&gt;Prediction market APIs are hard to design because they combine expiring financial instruments, real-time probability pricing, multi-outcome capital relationships, human users, and automated trading bots. Every API decision gets tested under latency, security, and correctness pressure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Polymarket, one of the largest prediction market platforms by volume, is useful to study because its API is not just CRUD over markets and orders. It separates discovery, trading, analytics, authentication, signed orders, and real-time updates into distinct surfaces.&lt;/p&gt;

&lt;p&gt;This article extracts eight implementation patterns you can apply when designing APIs for trading systems, crypto apps, fintech products, or any domain where state, trust, and data semantics matter.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 1: Separate APIs by domain, not by database entity
&lt;/h2&gt;

&lt;p&gt;Polymarket exposes three main APIs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gamma API&lt;/strong&gt; (&lt;code&gt;gamma-api.polymarket.com&lt;/code&gt;) — market discovery, events, tags, search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CLOB API&lt;/strong&gt; (&lt;code&gt;clob.polymarket.com&lt;/code&gt;) — order book data, pricing, order placement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data API&lt;/strong&gt; (&lt;code&gt;data-api.polymarket.com&lt;/code&gt;) — user positions, trades, analytics, leaderboards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each API has a different purpose:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;API&lt;/th&gt;
&lt;th&gt;Primary use&lt;/th&gt;
&lt;th&gt;Auth model&lt;/th&gt;
&lt;th&gt;Consumer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gamma API&lt;/td&gt;
&lt;td&gt;Browse and discover markets&lt;/td&gt;
&lt;td&gt;Public&lt;/td&gt;
&lt;td&gt;Apps, users, indexers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLOB API&lt;/td&gt;
&lt;td&gt;Read books and place orders&lt;/td&gt;
&lt;td&gt;Public reads, authenticated writes&lt;/td&gt;
&lt;td&gt;Traders, bots, market makers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data API&lt;/td&gt;
&lt;td&gt;Query wallet-based activity&lt;/td&gt;
&lt;td&gt;Public, address-scoped&lt;/td&gt;
&lt;td&gt;Dashboards, analytics tools&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A less deliberate design would put everything under one API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/api/markets
/api/orders
/api/users
/api/trades
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Polymarket instead separates by operational domain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://gamma-api.polymarket.com
https://clob.polymarket.com
https://data-api.polymarket.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That separation matters because discovery, trading, and analytics have different requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Discovery optimizes for searchability and browsing.&lt;/li&gt;
&lt;li&gt;Trading optimizes for correctness, latency, and authentication.&lt;/li&gt;
&lt;li&gt;Analytics optimizes for historical reads and wallet-level aggregation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementation takeaway
&lt;/h3&gt;

&lt;p&gt;When designing your API, start with usage boundaries instead of tables.&lt;/p&gt;

&lt;p&gt;Ask:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Who calls this API?
How often do they call it?
Does it need authentication?
Does it need low latency?
Can it scale independently?
Can it fail independently?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the answers differ significantly, consider separate API surfaces.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 2: Make read access public when data liquidity matters
&lt;/h2&gt;

&lt;p&gt;Polymarket makes market data public:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"https://gamma-api.polymarket.com/events?limit=5"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No API key is required for basic market discovery.&lt;/p&gt;

&lt;p&gt;That includes data such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Event metadata&lt;/li&gt;
&lt;li&gt;Market metadata&lt;/li&gt;
&lt;li&gt;Prices&lt;/li&gt;
&lt;li&gt;Order books&lt;/li&gt;
&lt;li&gt;Historical trades&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a deliberate platform decision. Traditional financial exchanges often monetize market data directly. Polymarket treats market data as infrastructure: the more people can read it, analyze it, and build on it, the more useful the market becomes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation takeaway
&lt;/h3&gt;

&lt;p&gt;Separate read access from write access.&lt;/p&gt;

&lt;p&gt;A common mistake is to require authentication for everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET /markets       requires auth
GET /order-book    requires auth
POST /orders       requires auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For many platforms, this creates unnecessary friction. A better model is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET /markets       public
GET /order-book    public
GET /trades        public
POST /orders       authenticated
DELETE /orders     authenticated
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Public reads are especially useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data consumers vastly outnumber writers.&lt;/li&gt;
&lt;li&gt;Developers need to explore before integrating.&lt;/li&gt;
&lt;li&gt;Bots, dashboards, and indexers increase platform value.&lt;/li&gt;
&lt;li&gt;The sensitive action is mutation, not observation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Add authentication at the point where risk appears: placing orders, moving funds, changing state, or accessing private account information.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 3: Use different authentication levels for different trust levels
&lt;/h2&gt;

&lt;p&gt;Trading endpoints require authentication, but Polymarket uses two authentication levels with different responsibilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  L1 authentication: prove wallet ownership
&lt;/h3&gt;

&lt;p&gt;L1 authentication uses an EIP-712 signature from the user’s private key. It proves that the caller controls the wallet.&lt;/p&gt;

&lt;p&gt;You use it to create or derive API credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// L1: Use your private key to derive API credentials&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;credentials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createOrDeriveApiKey&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Example result:&lt;/span&gt;
&lt;span class="c1"&gt;// {&lt;/span&gt;
&lt;span class="c1"&gt;//   key: "...",&lt;/span&gt;
&lt;span class="c1"&gt;//   secret: "...",&lt;/span&gt;
&lt;span class="c1"&gt;//   passphrase: "..."&lt;/span&gt;
&lt;span class="c1"&gt;// }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a high-trust action. It should require the strongest credential: the private key signature.&lt;/p&gt;

&lt;h3&gt;
  
  
  L2 authentication: sign each API request
&lt;/h3&gt;

&lt;p&gt;After API credentials exist, routine trading requests use HMAC-SHA256 headers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POLY_ADDRESS&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;0x...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POLY_SIGNATURE&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;hmac-sha256&amp;gt;&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POLY_TIMESTAMP&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1716000000&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POLY_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;550e8400-...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POLY_PASSPHRASE&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;L2 authentication proves that a specific request came from the credential holder without requiring a private-key signature on every API call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation takeaway
&lt;/h3&gt;

&lt;p&gt;Do not use the same authentication ceremony for every action.&lt;/p&gt;

&lt;p&gt;A practical model:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Auth strength&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Create API key&lt;/td&gt;
&lt;td&gt;Strong identity proof&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rotate credentials&lt;/td&gt;
&lt;td&gt;Strong identity proof&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Place order&lt;/td&gt;
&lt;td&gt;Request signature/session credential&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read public market data&lt;/td&gt;
&lt;td&gt;No auth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read private account data&lt;/td&gt;
&lt;td&gt;Session credential&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Withdraw funds&lt;/td&gt;
&lt;td&gt;Strong identity proof&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This maps beyond crypto. In a traditional app:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;L1 is “prove you are the account owner.”&lt;/li&gt;
&lt;li&gt;L2 is “prove this request came from an active authorized session.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That distinction improves both security and usability.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 4: Treat high-stakes actions as signed payloads, not just API calls
&lt;/h2&gt;

&lt;p&gt;On Polymarket, placing an order is not merely sending JSON to a server. The order is a cryptographically signed financial instruction.&lt;/p&gt;

&lt;p&gt;Example order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createAndPostOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;tokenID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;71321045679...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.65&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;side&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Side&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BUY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;tickSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;0.01&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;negRisk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="nx"&gt;OrderType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GTC&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, the SDK creates an EIP-712 typed data structure, signs it with the user’s private key, and submits the signed order. The matching engine runs offchain, but matched trades settle on Polygon using those signatures.&lt;/p&gt;

&lt;p&gt;The important design point: the operator cannot fabricate trades or move funds without user authorization. The signed message is the authorization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conventional API semantics
&lt;/h3&gt;

&lt;p&gt;In a normal API, this means:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Please perform this action for me.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /orders
Authorization: Bearer &amp;lt;token&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The server decides whether to execute the action.&lt;/p&gt;

&lt;h3&gt;
  
  
  Signed-message semantics
&lt;/h3&gt;

&lt;p&gt;With signed orders, the payload means:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Here is a signed instruction authorizing this exact action.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API acts more like a relay than an authority.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation takeaway
&lt;/h3&gt;

&lt;p&gt;For high-stakes operations, consider making the payload itself verifiable.&lt;/p&gt;

&lt;p&gt;Useful domains include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Financial transactions&lt;/li&gt;
&lt;li&gt;Legal approvals&lt;/li&gt;
&lt;li&gt;Contract signatures&lt;/li&gt;
&lt;li&gt;Permission grants&lt;/li&gt;
&lt;li&gt;Sensitive workflow approvals&lt;/li&gt;
&lt;li&gt;Cross-system authorization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of relying only on transport-layer credentials, encode authorization into the payload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"transfer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"100.00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"asset"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USDC"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"recipient"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0x..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"expiresAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-01T00:00:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"signature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0x..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you better auditability, non-repudiation, and replay protection when designed correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 5: Encode the domain ontology in the API model
&lt;/h2&gt;

&lt;p&gt;Polymarket models prediction markets using two important objects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Event&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Market&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An &lt;strong&gt;Event&lt;/strong&gt; is the broader question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Who will win the 2026 US Senate race in Pennsylvania?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A &lt;strong&gt;Market&lt;/strong&gt; is a specific tradable binary outcome inside that event:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Will Bob Casey win?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One event can contain many markets.&lt;/p&gt;

&lt;p&gt;Example structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"501"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026 Pennsylvania Senate Race"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"negRisk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"markets"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2301"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"question"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Will Bob Casey win?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"outcomePrices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;0.42&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;0.58&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2302"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"question"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Will Dave McCormick win?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"outcomePrices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;0.35&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;0.65&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2303"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"question"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Will a third candidate win?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"outcomePrices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;0.23&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;0.77&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This distinction is not cosmetic. It tells API consumers how the domain works.&lt;/p&gt;

&lt;p&gt;The event groups related markets. The market represents a tradable outcome. The &lt;code&gt;negRisk&lt;/code&gt; flag signals that markets inside the event have capital relationships.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation takeaway
&lt;/h3&gt;

&lt;p&gt;Avoid flattening important domain concepts into generic resources.&lt;/p&gt;

&lt;p&gt;A weak model might expose only:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET /markets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A stronger model exposes relationships:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET /events
GET /events/:id/markets
GET /markets/:id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the distinction matters to business logic, it should exist in the API.&lt;/p&gt;

&lt;p&gt;Good domain modeling helps clients avoid incorrect assumptions. For example, if an automated trader ignores &lt;code&gt;negRisk: true&lt;/code&gt;, it may construct the wrong position model.&lt;/p&gt;

&lt;p&gt;Your API should make these relationships visible instead of hiding them in documentation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 6: Represent domain invariants as API fields
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;negRisk&lt;/code&gt; flag is one of Polymarket’s most interesting design choices.&lt;/p&gt;

&lt;p&gt;In a standard multi-outcome event, each market can be treated independently. But in a NegRisk event, exactly one outcome can win. That creates mathematical relationships between positions:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;1 No token on outcome A ≡ 1 Yes token on every other outcome&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1× No (Other)&lt;/td&gt;
&lt;td&gt;1× Yes (Casey) + 1× Yes (McCormick)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is not just theoretical. It affects trading and settlement behavior.&lt;/p&gt;

&lt;p&gt;Polymarket exposes this through API fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"negRisk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And when placing orders, the client must pass the correct market options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createAndPostOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;tokenID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;71321045679...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.65&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;side&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Side&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BUY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;tickSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;0.01&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;negRisk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="nx"&gt;OrderType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GTC&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the client gets this wrong, the order can be rejected or handled incorrectly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation takeaway
&lt;/h3&gt;

&lt;p&gt;If your domain has hard rules, encode them as typed fields.&lt;/p&gt;

&lt;p&gt;Do not leave critical invariants only in prose documentation.&lt;/p&gt;

&lt;p&gt;Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"requiresKyc"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"settlementMode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"on_chain"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"isMutuallyExclusive"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"minCollateralRatio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.50"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"supportsPartialFill"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"expiresAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-01T00:00:00Z"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fields like these are valuable because clients can branch on them programmatically.&lt;/p&gt;

&lt;p&gt;Documentation explains the rule. The API should expose the rule.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 7: Treat changing market parameters as state, not configuration
&lt;/h2&gt;

&lt;p&gt;Many financial APIs treat tick size as static. Polymarket exposes tick size as dynamic market state.&lt;/p&gt;

&lt;p&gt;When a market price approaches the extremes, above &lt;code&gt;0.96&lt;/code&gt; or below &lt;code&gt;0.04&lt;/code&gt;, the minimum tick size narrows from &lt;code&gt;0.01&lt;/code&gt; to &lt;code&gt;0.001&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Example WebSocket event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tick_size_change"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"asset_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"65818619657..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"old_tick_size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"new_tick_size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"100000000"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reason is practical. Near extreme probabilities, a 1-cent tick is too coarse. Moving from &lt;code&gt;0.04&lt;/code&gt; to &lt;code&gt;0.03&lt;/code&gt; is a large relative move. A smaller tick allows prices like &lt;code&gt;0.973&lt;/code&gt; instead of forcing &lt;code&gt;0.97&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation takeaway
&lt;/h3&gt;

&lt;p&gt;Do not assume market parameters are static.&lt;/p&gt;

&lt;p&gt;For trading clients, tick size should be part of the current market state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;MarketState&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;assetId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;bestBid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;bestAsk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;tickSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a &lt;code&gt;tick_size_change&lt;/code&gt; event arrives, update local state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleTickSizeChange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;asset_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;new_tick_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;marketState&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;asset_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;tickSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;new_tick_size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then validate orders against the current tick size before submitting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;isValidPrice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tickSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isInteger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;price&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;tickSize&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your client hard-codes tick size, it will eventually submit invalid orders.&lt;/p&gt;

&lt;p&gt;The broader principle: changing domain state should be broadcast explicitly, not discovered only through failed requests.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 8: Use separate WebSocket layers for different real-time consumers
&lt;/h2&gt;

&lt;p&gt;Polymarket runs two separate WebSocket systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Market Channel
&lt;/h3&gt;

&lt;p&gt;The Market Channel is designed for trading consumers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://ws-subscriptions-clob.polymarket.com/ws/market
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It streams data such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Order book snapshots&lt;/li&gt;
&lt;li&gt;Price changes&lt;/li&gt;
&lt;li&gt;Trade executions&lt;/li&gt;
&lt;li&gt;Tick size changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Subscription example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"assets_ids"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"65818619657568813474341868652308942079804919287380422192892211131408793125422"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"market"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This channel is optimized around asset IDs and low-latency trading workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-Time Data Socket
&lt;/h3&gt;

&lt;p&gt;The Real-Time Data Socket serves a different use case:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://ws-live-data.polymarket.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It streams broader platform activity, including comments, crypto prices, equity prices, and social interaction events.&lt;/p&gt;

&lt;p&gt;Subscription example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"subscribe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"subscriptions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"topic"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"crypto_prices"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"update"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"filters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"btcusdt,ethusd"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These consumers have different needs.&lt;/p&gt;

&lt;p&gt;A market maker needs low-latency order book updates. A UI showing platform activity needs comments, prices, and social events. Combining both into one WebSocket system would force one infrastructure layer to serve conflicting requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation takeaway
&lt;/h3&gt;

&lt;p&gt;Separate real-time infrastructure when consumers differ by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Latency requirements&lt;/li&gt;
&lt;li&gt;Message volume&lt;/li&gt;
&lt;li&gt;Failure tolerance&lt;/li&gt;
&lt;li&gt;Data shape&lt;/li&gt;
&lt;li&gt;Subscription model&lt;/li&gt;
&lt;li&gt;Operational priority&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A practical split might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/ws/trading        low latency, order books, fills
/ws/activity       comments, notifications, social events
/ws/analytics      aggregates, leaderboards, dashboards
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Trying to make one WebSocket endpoint serve every use case usually creates unnecessary complexity and uneven performance.&lt;/p&gt;




&lt;h2&gt;
  
  
  What these patterns have in common
&lt;/h2&gt;

&lt;p&gt;Polymarket’s API design makes the domain structure visible.&lt;/p&gt;

&lt;p&gt;The main patterns are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Separate APIs by operational domain.&lt;/li&gt;
&lt;li&gt;Make public read access easy when data liquidity matters.&lt;/li&gt;
&lt;li&gt;Use different authentication levels for different trust levels.&lt;/li&gt;
&lt;li&gt;Represent high-stakes actions as signed payloads.&lt;/li&gt;
&lt;li&gt;Encode the domain ontology in the API model.&lt;/li&gt;
&lt;li&gt;Surface domain invariants as explicit fields.&lt;/li&gt;
&lt;li&gt;Treat changing parameters as real-time state.&lt;/li&gt;
&lt;li&gt;Split WebSocket infrastructure by consumer profile.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The broader design lesson: do not abstract away distinctions that matter.&lt;/p&gt;

&lt;p&gt;If a concept affects client behavior, put it in the API. If a rule affects correctness, expose it as a field. If state changes over time, broadcast the change. If different consumers have different performance requirements, give them different interfaces.&lt;/p&gt;

&lt;p&gt;Good API design is not only about clean routes and consistent naming. It is about making the system’s real constraints understandable and programmable.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Get Free Unlimited Gemini API</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Sat, 09 May 2026 06:58:45 +0000</pubDate>
      <link>https://forem.com/hassann/get-free-unlimited-gemini-api-58f6</link>
      <guid>https://forem.com/hassann/get-free-unlimited-gemini-api-58f6</guid>
      <description>&lt;p&gt;Google’s Gemini family is a cost-effective frontier model line for high-volume workloads, but token costs can still grow quickly when a public app, side project, or hackathon demo gets real traffic. Puter.js changes the billing model: it exposes Gemini models such as 2.5 Pro, 2.5 Flash, 2.0 Flash, 3 Flash Preview, and Gemma models without requiring your Google API key. The end user signs in with Puter and covers usage from their account, while your app calls the model from the browser.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Puter.js&lt;/strong&gt; gives browser apps access to Gemini and Gemma models without a Google API key, Google Cloud project, or backend.&lt;/li&gt;
&lt;li&gt;Supported Gemini models include &lt;strong&gt;2.5 Pro, 2.5 Flash, 2.5 Flash Lite, 2.0 Flash, 2.0 Flash Lite, 3 Flash Preview&lt;/strong&gt;, plus dated previews.&lt;/li&gt;
&lt;li&gt;Supported Gemma models include &lt;strong&gt;Gemma 2, 3, and 4&lt;/strong&gt; in multiple sizes.&lt;/li&gt;
&lt;li&gt;Setup can be as small as one &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tag and one &lt;code&gt;puter.ai.chat()&lt;/code&gt; call.&lt;/li&gt;
&lt;li&gt;Streaming, image input, and temperature control work from the browser.&lt;/li&gt;
&lt;li&gt;Usage is charged to the signed-in Puter user, not your developer account.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to compare a Puter prototype with the official Gemini API before migrating.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How “free unlimited” works
&lt;/h2&gt;

&lt;p&gt;Puter.js inverts the usual LLM billing flow.&lt;/p&gt;

&lt;p&gt;Instead of your app holding a Google AI Studio key and paying for every token, the user signs in to Puter. Calls are made on behalf of that signed-in user and usage is charged against their Puter balance. New Puter accounts receive starter credit, and users can top up if they need more.&lt;/p&gt;

&lt;p&gt;For developers, this means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No Google Cloud project&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No AI Studio API key&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No server-side token proxy&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No key rotation&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No billing exposure from public traffic&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trade-off: Puter.js is browser-first. It assumes a user session, so it is not a clean fit for backend-only jobs such as cron tasks, batch processors, or webhooks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Install Puter.js
&lt;/h2&gt;

&lt;p&gt;For a static page, add the CDN script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is enough to call Gemini from the browser.&lt;/p&gt;

&lt;p&gt;For a bundled app, install the package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @heyputer/puter.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then import it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@heyputer/puter.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Pick a Gemini or Gemma model
&lt;/h2&gt;

&lt;p&gt;Choose the smallest model that handles your task well.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model ID&lt;/th&gt;
&lt;th&gt;When to use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;google/gemini-2.5-pro&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Hard reasoning, complex analysis, long-context tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;google/gemini-2.5-flash&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Default choice for most app features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;google/gemini-2.5-flash-lite&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;High-volume classification, tagging, simple Q&amp;amp;A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;google/gemini-2.0-flash&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stable baseline with well-understood behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;google/gemini-3-flash-preview&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Latest preview model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;google/gemma-3-27b-it&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Open Gemma, instruction-tuned workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;google/gemma-4-31b-it&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Larger open Gemma option&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For most apps, start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;google/gemini-2.5-flash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;google/gemini-2.5-pro&lt;/code&gt; only when the prompt needs stronger reasoning. Use Lite variants for high-volume, low-complexity tasks like classification or tagging.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Make your first Gemini call
&lt;/h2&gt;

&lt;p&gt;Create an HTML file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;body&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;

  &lt;span class="nt"&gt;&amp;lt;script&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain machine learning in three sentences&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open the file in a browser.&lt;/p&gt;

&lt;p&gt;On first use, Puter handles authentication. The user signs in or creates a free Puter account, then the response is printed to the page.&lt;/p&gt;

&lt;p&gt;No API key. No &lt;code&gt;.env&lt;/code&gt; file. No backend route.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Stream the response
&lt;/h2&gt;

&lt;p&gt;For chat UIs, stream tokens as they arrive:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain photosynthesis in detail&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;part&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;outputDiv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerHTML&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A simple UI target could look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"output"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/div&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;script&amp;gt;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;outputDiv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getElementById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;output&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each &lt;code&gt;part.text&lt;/code&gt; contains a response chunk. Append it to your UI so the user sees the answer appear progressively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Send image input to Gemini
&lt;/h2&gt;

&lt;p&gt;Gemini supports multimodal prompts. With Puter.js, pass the prompt first, then the image URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What do you see in this image? Describe colors, objects, and mood.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://assets.puter.site/doge.jpeg&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Practical use cases include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alt-text generation&lt;/li&gt;
&lt;li&gt;Visual question answering&lt;/li&gt;
&lt;li&gt;Screenshot analysis&lt;/li&gt;
&lt;li&gt;OCR-style extraction&lt;/li&gt;
&lt;li&gt;Accessibility tooling&lt;/li&gt;
&lt;li&gt;Product image tagging&lt;/li&gt;
&lt;li&gt;Diagram explanation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 6: Tune temperature
&lt;/h2&gt;

&lt;p&gt;Pass model parameters in the options object:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Write a creative short story about a robot chef&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use lower values for deterministic output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JSON generation&lt;/li&gt;
&lt;li&gt;Classification&lt;/li&gt;
&lt;li&gt;Extraction&lt;/li&gt;
&lt;li&gt;Factual answers&lt;/li&gt;
&lt;li&gt;Structured summaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use higher values for more variation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Brainstorming&lt;/li&gt;
&lt;li&gt;Creative writing&lt;/li&gt;
&lt;li&gt;Marketing copy&lt;/li&gt;
&lt;li&gt;Ideation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 7: Build multi-turn conversations
&lt;/h2&gt;

&lt;p&gt;Pass an array of messages instead of a single string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;I am building a Next.js app with Postgres.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Got it. What do you need help with?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;How should I structure migrations?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.5-pro&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For an actual chat UI, keep updating the message array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini receives the full conversation history on each call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compare Gemini with other models
&lt;/h2&gt;

&lt;p&gt;Puter exposes multiple model providers through one interface. You can benchmark the same prompt across models by changing only the &lt;code&gt;model&lt;/code&gt; string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5.5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x-ai/grok-4.3&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Refactor this React component to use hooks: ...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;performance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;performance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;elapsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;ms`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;---&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use this pattern to compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;Output quality&lt;/li&gt;
&lt;li&gt;Formatting consistency&lt;/li&gt;
&lt;li&gt;Instruction following&lt;/li&gt;
&lt;li&gt;Coding accuracy&lt;/li&gt;
&lt;li&gt;Cost profile for the user&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many apps, Gemini Flash is a strong default when latency matters. For harder prompts, benchmark against other models before choosing a production default.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Puter.js gives you
&lt;/h2&gt;

&lt;p&gt;You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gemini 2.5, 2.0, and 3 Flash variants&lt;/li&gt;
&lt;li&gt;Gemini 2.5 Pro&lt;/li&gt;
&lt;li&gt;Gemma 2, 3, and 4 models&lt;/li&gt;
&lt;li&gt;Multi-turn conversations&lt;/li&gt;
&lt;li&gt;Streaming responses&lt;/li&gt;
&lt;li&gt;Image URL input&lt;/li&gt;
&lt;li&gt;Temperature control&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_tokens&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;System prompts&lt;/li&gt;
&lt;li&gt;Browser-based production usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You may not get, depending on the current Puter version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Native Gemini function calling&lt;/li&gt;
&lt;li&gt;Code execution tools&lt;/li&gt;
&lt;li&gt;Google Search grounding&lt;/li&gt;
&lt;li&gt;Gemini’s full 2M-token context ceiling on every model&lt;/li&gt;
&lt;li&gt;Server-side use without a browser session&lt;/li&gt;
&lt;li&gt;Direct Google rate-limit visibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For agentic workflows that require code execution, grounding, or strict server-side control, the official Google Gemini API is usually the better fit. For browser-based chat, Q&amp;amp;A, content generation, and vision tasks, Puter.js is often enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use Puter.js vs the official Gemini API
&lt;/h2&gt;

&lt;p&gt;Use Puter.js when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are building a free public app.&lt;/li&gt;
&lt;li&gt;You do not want token costs attached to your developer account.&lt;/li&gt;
&lt;li&gt;You are prototyping quickly.&lt;/li&gt;
&lt;li&gt;You do not want to configure Google Cloud.&lt;/li&gt;
&lt;li&gt;You are building a static site, hackathon app, or browser extension.&lt;/li&gt;
&lt;li&gt;Your users can sign in to Puter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the official Gemini API when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need backend calls.&lt;/li&gt;
&lt;li&gt;You need cron jobs, batch jobs, or webhooks.&lt;/li&gt;
&lt;li&gt;You need code execution.&lt;/li&gt;
&lt;li&gt;You need Search grounding.&lt;/li&gt;
&lt;li&gt;You need the full Gemini Pro long-context ceiling.&lt;/li&gt;
&lt;li&gt;You need a direct compliance or billing relationship with Google.&lt;/li&gt;
&lt;li&gt;You need fine-tuning on your own dataset.&lt;/li&gt;
&lt;li&gt;Your users will not accept a Puter sign-in step.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a standalone Gemini 3 Flash walkthrough, see &lt;a href="http://apidog.com/blog/how-to-use-gemini-3-flash-preview-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the Gemini 3 Flash Preview API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test the integration with Apidog
&lt;/h2&gt;

&lt;p&gt;Puter calls happen in the browser, so you cannot test them exactly like a backend REST API. A practical workflow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build a small static Puter page.&lt;/li&gt;
&lt;li&gt;Accept a prompt through a query parameter.&lt;/li&gt;
&lt;li&gt;Use that page for browser-based prototype testing.&lt;/li&gt;
&lt;li&gt;Use Apidog to validate the official Gemini API surface for a future migration.&lt;/li&gt;
&lt;li&gt;Keep Puter and Gemini API configs as separate environments.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example environment split:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Environment&lt;/th&gt;
&lt;th&gt;Base URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;puter-prototype&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Your localhost/static page URL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemini-prod&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;https://generativelanguage.googleapis.com/v1&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can &lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;download Apidog&lt;/a&gt;, create both environments, and keep the same prompt payloads documented in one collection.&lt;/p&gt;

&lt;p&gt;For more API testing patterns, see &lt;a href="http://apidog.com/blog/api-testing-tool-qa-engineers?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;API testing tool for QA engineers&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other free LLM paths through Puter
&lt;/h2&gt;

&lt;p&gt;The same user-pays model works across other providers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://apidog.com/blog/get-free-unlimited-claude-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Get free unlimited Claude API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://apidog.com/blog/get-free-unlimited-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Get free unlimited GPT-5.5 API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://apidog.com/blog/how-to-use-grok-4-3-api-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use Grok 4.3 for free&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://apidog.com/blog/get-free-unlimited-deepseek-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Get free unlimited DeepSeek API&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The implementation pattern is the same: keep the Puter script and switch the &lt;code&gt;model&lt;/code&gt; value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Summarize this issue for a developer changelog.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is this truly unlimited?
&lt;/h3&gt;

&lt;p&gt;Unlimited from the developer’s side, yes. Your app does not pay per token from your own Google account. The signed-in Puter user has whatever balance is available in their Puter account.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need a Google account or Google Cloud project?
&lt;/h3&gt;

&lt;p&gt;No. Puter handles the upstream relationship. Your app does not need a Google API key.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use this in production?
&lt;/h3&gt;

&lt;p&gt;Yes, for browser-based apps. The main product decision is whether your users are willing to sign in with Puter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Gemini through Puter behave like the official API?
&lt;/h3&gt;

&lt;p&gt;Puter calls Google’s API on the user’s behalf. Model behavior should be aligned with the underlying model. Latency may differ because Puter adds another layer between your browser app and the upstream model.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about Gemini’s 2M-token context window?
&lt;/h3&gt;

&lt;p&gt;Puter may not expose the full 2M-token ceiling for every model variant. If your app depends on extremely long context, use the official Google Gemini API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use Puter Gemini in a Discord bot or backend service?
&lt;/h3&gt;

&lt;p&gt;Not cleanly. Puter.js is browser-first and assumes a logged-in user session. Backend services should use the official Gemini API directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  What model should I default to?
&lt;/h3&gt;

&lt;p&gt;Start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;google/gemini-2.5-flash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Move to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;google/gemini-2.5-pro
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;for difficult reasoning tasks.&lt;/p&gt;

&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;google/gemini-2.5-flash-lite
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;for high-volume classification or tagging.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Imagen image generation supported?
&lt;/h3&gt;

&lt;p&gt;Puter exposes image generation through OpenAI image models such as &lt;code&gt;gpt-image-2&lt;/code&gt; and DALL-E variants today, not Imagen. See &lt;a href="http://apidog.com/blog/get-free-unlimited-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Get free unlimited GPT-5.5 API&lt;/a&gt; for that path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Puter.js is a practical way to add Gemini to browser-based apps without managing Google Cloud, API keys, or developer-side token billing.&lt;/p&gt;

&lt;p&gt;The basic implementation is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain this code snippet.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use Puter.js for prototypes, hackathon builds, free public apps, static sites, and browser extensions. Use the official Gemini API when you need backend execution, fine-tuning, code tools, Search grounding, or maximum long-context support.&lt;/p&gt;

&lt;p&gt;Build the request once in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;, compare Puter with the official API, and choose the path that matches your app.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Get Free Unlimited GPT-5.5 API and All OpenAI Models</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Sat, 09 May 2026 02:34:52 +0000</pubDate>
      <link>https://forem.com/hassann/get-free-unlimited-gpt-55-api-and-all-openai-models-1k4c</link>
      <guid>https://forem.com/hassann/get-free-unlimited-gpt-55-api-and-all-openai-models-1k4c</guid>
      <description>&lt;p&gt;OpenAI’s GPT-5.5 API pricing ($5 per million input tokens, $30 per million output tokens) can block side projects, hackathon apps, and free public tools before they ship. Puter.js offers a browser-first workaround: it exposes OpenAI models such as GPT-5.5, GPT-5.5 Pro, GPT-5.x variants, GPT-Image-2, DALL-E, and OpenAI TTS without requiring your OpenAI API key. Instead of billing you, usage is charged to the signed-in Puter end user.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Puter.js&lt;/strong&gt; when you want OpenAI access in the browser without managing an OpenAI account, API key, backend, or billing.&lt;/li&gt;
&lt;li&gt;Text models include &lt;strong&gt;gpt-5.5, gpt-5.5-pro, gpt-5.4, gpt-5, gpt-5-mini, o1, o3, gpt-4.1, gpt-4o&lt;/strong&gt;, and chat/codex variants.&lt;/li&gt;
&lt;li&gt;Image models include &lt;strong&gt;gpt-image-2, gpt-image-1.5, dall-e-3&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;TTS models include &lt;strong&gt;gpt-4o-mini-tts, tts-1, tts-1-hd&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Add one &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tag, call &lt;code&gt;puter.ai.chat()&lt;/code&gt;, and you can run GPT-5.5 from a browser page.&lt;/li&gt;
&lt;li&gt;Streaming, function calling, vision input, image generation, and text-to-speech are available from the browser.&lt;/li&gt;
&lt;li&gt;The end user covers usage through their Puter account; your app does not receive OpenAI invoices.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to compare Puter-based prototypes with the official OpenAI API before migration.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Puter’s “free unlimited” model works
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://developer.puter.com/tutorials/free-unlimited-openai-api/" rel="noopener noreferrer"&gt;Puter.js&lt;/a&gt; changes who pays for LLM usage.&lt;/p&gt;

&lt;p&gt;In a standard OpenAI integration:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You create an OpenAI account.&lt;/li&gt;
&lt;li&gt;You store an API key.&lt;/li&gt;
&lt;li&gt;Your app sends requests to OpenAI.&lt;/li&gt;
&lt;li&gt;You pay for all user usage.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With Puter:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your app loads Puter.js in the browser.&lt;/li&gt;
&lt;li&gt;The user signs in to Puter.&lt;/li&gt;
&lt;li&gt;Your app calls OpenAI-compatible models through Puter.&lt;/li&gt;
&lt;li&gt;Usage is charged to the user’s Puter balance.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For developers, this means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No OpenAI key in your repo&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No token bill attached to your account&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No server required for browser apps&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No per-developer usage cap&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trade-off: Puter is browser-first. If you need cron jobs, webhook handlers, background workers, or backend-only automation, use the official OpenAI API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Install Puter.js
&lt;/h2&gt;

&lt;p&gt;For a plain HTML page, add the CDN script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is enough for static sites, prototypes, browser extensions, and hackathon demos.&lt;/p&gt;

&lt;p&gt;For a bundled JavaScript app, install the package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @heyputer/puter.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then import it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@heyputer/puter.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use the CDN when you want the fastest possible setup. Use the npm package when you want bundler support and TypeScript types.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Choose a model
&lt;/h2&gt;

&lt;p&gt;Puter exposes GPT-5.x models and older OpenAI models. Pick the smallest model that meets your quality requirements.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model ID&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gpt-5.5-pro&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Hard reasoning, coding agents, complex analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gpt-5.5&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Default model for general chat and reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gpt-5.4-nano&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fast, low-cost classification or extraction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gpt-5.4-mini&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Chat UIs and mid-complexity tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gpt-5.3-codex&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Code-focused workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;o3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Complex reasoning chains&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;o1-pro&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Agentic multi-step planning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;gpt-4.1&lt;/code&gt;, &lt;code&gt;gpt-4o&lt;/code&gt;, &lt;code&gt;gpt-4o-mini&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Stable baseline models&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For image generation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;gpt-image-2&lt;/code&gt;: latest image model&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;gpt-image-1.5&lt;/code&gt;, &lt;code&gt;gpt-image-1&lt;/code&gt;, &lt;code&gt;dall-e-3&lt;/code&gt;, &lt;code&gt;dall-e-2&lt;/code&gt;: older stable options&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For text-to-speech:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;gpt-4o-mini-tts&lt;/code&gt;: newer TTS model&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tts-1&lt;/code&gt;, &lt;code&gt;tts-1-hd&lt;/code&gt;: classic TTS options&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 3: Call GPT-5.5 from the browser
&lt;/h2&gt;

&lt;p&gt;Create an &lt;code&gt;index.html&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;body&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;

  &lt;span class="nt"&gt;&amp;lt;script&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain WebSockets in three sentences&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5.5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open the file in a browser.&lt;/p&gt;

&lt;p&gt;Puter handles authentication and the model request. On first use, the user signs in or creates a Puter account. You do not need an OpenAI key, &lt;code&gt;.env&lt;/code&gt; file, proxy server, or backend route.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Stream responses for chat UIs
&lt;/h2&gt;

&lt;p&gt;For long answers, stream tokens instead of waiting for the full response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain the theory of relativity in detail&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5.5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;part&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a real UI, append each chunk to the current assistant message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;#output&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;part&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use streaming for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chatbots&lt;/li&gt;
&lt;li&gt;Documentation assistants&lt;/li&gt;
&lt;li&gt;Long-form explanations&lt;/li&gt;
&lt;li&gt;Code generation&lt;/li&gt;
&lt;li&gt;Any UX where users should see progress immediately&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 5: Send image input to a vision model
&lt;/h2&gt;

&lt;p&gt;Pass the prompt, image URL, and model options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What do you see in this image? Describe colors, objects, and mood.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://assets.puter.site/doge.jpeg&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5.5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use vision input for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alt-text generation&lt;/li&gt;
&lt;li&gt;Screenshot analysis&lt;/li&gt;
&lt;li&gt;Visual QA&lt;/li&gt;
&lt;li&gt;OCR-like workflows&lt;/li&gt;
&lt;li&gt;Accessibility tooling&lt;/li&gt;
&lt;li&gt;Product image inspection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This works with GPT-5.x models and GPT-4o variants.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Generate images
&lt;/h2&gt;

&lt;p&gt;Use &lt;code&gt;puter.ai.txt2img()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;txt2img&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;A futuristic cityscape at night, cinematic, neon, rain&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-image-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imageElement&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendChild&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imageElement&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;txt2img()&lt;/code&gt; returns an &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; element that you can insert directly into the DOM.&lt;/p&gt;

&lt;p&gt;Example with a basic UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;input&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"prompt"&lt;/span&gt; &lt;span class="na"&gt;placeholder=&lt;/span&gt;&lt;span class="s"&gt;"Describe an image..."&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;button&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"generate"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Generate&lt;span class="nt"&gt;&amp;lt;/button&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"result"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/div&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;script&amp;gt;&lt;/span&gt;
  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;#generate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;click&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;#prompt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;#result&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Generating...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;txt2img&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-image-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerHTML&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendChild&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;image&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user pays the image generation cost from their Puter account.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Convert text to speech
&lt;/h2&gt;

&lt;p&gt;Use &lt;code&gt;puter.ai.txt2speech()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;txt2speech&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Welcome back. Your account balance is $1,247.50.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o-mini-tts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;audio&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;controls&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendChild&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The function returns an &lt;code&gt;&amp;lt;audio&amp;gt;&lt;/code&gt; element.&lt;/p&gt;

&lt;p&gt;Use it for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Voice prompts&lt;/li&gt;
&lt;li&gt;Accessibility narration&lt;/li&gt;
&lt;li&gt;Product walkthroughs&lt;/li&gt;
&lt;li&gt;App voiceovers&lt;/li&gt;
&lt;li&gt;Podcast intros&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 8: Add function calling
&lt;/h2&gt;

&lt;p&gt;Puter supports the standard OpenAI-style tool definition shape.&lt;/p&gt;

&lt;p&gt;Define your tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;function&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;function&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;get_weather&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Get the current weather for a city.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;city&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Send the prompt with tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What's the weather in Tokyo right now?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5.5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;tools&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read the tool call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;toolCalls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;toolCalls&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;call&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toolCalls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Function:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Arguments:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Execute your function here.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model emits the tool call. Your app is responsible for executing the function and sending the result back into the conversation.&lt;/p&gt;

&lt;p&gt;For testing tool-driven flows in production-grade settings, see &lt;a href="http://apidog.com/blog/mcp-server-testing-apidog?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;MCP server testing in Apidog&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 9: Tune &lt;code&gt;temperature&lt;/code&gt; and &lt;code&gt;max_tokens&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Pass OpenAI-style parameters in the options object:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Tell me about Mars&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5.5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Recommended defaults:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;defaults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5.5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use lower temperature for predictable output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt; &lt;span class="c1"&gt;// deterministic / factual&lt;/span&gt;
&lt;span class="nx"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt; &lt;span class="c1"&gt;// documentation, summaries, QA&lt;/span&gt;
&lt;span class="nx"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt; &lt;span class="c1"&gt;// creative writing&lt;/span&gt;
&lt;span class="nx"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="c1"&gt;// highly varied output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;max_tokens&lt;/code&gt; to keep responses bounded and avoid unnecessary user-side cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Puter gives you
&lt;/h2&gt;

&lt;p&gt;Puter’s browser-first OpenAI access is useful when you want to ship quickly without handling billing.&lt;/p&gt;

&lt;p&gt;You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-5.x models, including GPT-5.5 and GPT-5.5 Pro&lt;/li&gt;
&lt;li&gt;Older OpenAI models such as GPT-4.1, GPT-4o, o1, and o3&lt;/li&gt;
&lt;li&gt;GPT-Image-2 and DALL-E image generation&lt;/li&gt;
&lt;li&gt;OpenAI TTS models, including &lt;code&gt;gpt-4o-mini-tts&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Streaming&lt;/li&gt;
&lt;li&gt;Vision input&lt;/li&gt;
&lt;li&gt;Function calling&lt;/li&gt;
&lt;li&gt;Temperature control&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_tokens&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Puter does not replace
&lt;/h2&gt;

&lt;p&gt;Puter is not a full replacement for every official OpenAI API workflow.&lt;/p&gt;

&lt;p&gt;You may not get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Responses API support&lt;/li&gt;
&lt;li&gt;Prompt caching cost controls&lt;/li&gt;
&lt;li&gt;Files API support&lt;/li&gt;
&lt;li&gt;Backend-only usage without a browser session&lt;/li&gt;
&lt;li&gt;Direct OpenAI rate-limit headers&lt;/li&gt;
&lt;li&gt;OpenAI structured output mode and JSON schema enforcement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the official OpenAI API when you need backend execution, compliance controls, structured outputs, prompt caching, or direct OpenAI account management.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use Puter vs the official OpenAI API
&lt;/h2&gt;

&lt;p&gt;Use Puter when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are building a browser-based app.&lt;/li&gt;
&lt;li&gt;You want to avoid OpenAI billing exposure.&lt;/li&gt;
&lt;li&gt;You are prototyping and do not want to set up an OpenAI account.&lt;/li&gt;
&lt;li&gt;You are building a static site, browser extension, or hackathon demo.&lt;/li&gt;
&lt;li&gt;Your users are willing to sign in to Puter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the official OpenAI API when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need server-side calls.&lt;/li&gt;
&lt;li&gt;You need cron jobs, webhook handlers, queues, or batch processing.&lt;/li&gt;
&lt;li&gt;You need prompt caching.&lt;/li&gt;
&lt;li&gt;You need the Responses API, Files API, or structured outputs.&lt;/li&gt;
&lt;li&gt;You need compliance terms such as BAAs, SOC 2, or residency guarantees.&lt;/li&gt;
&lt;li&gt;Your users will not accept a Puter sign-in step.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many projects can start with Puter, validate the product, then migrate to the official API when backend or compliance requirements appear.&lt;/p&gt;

&lt;p&gt;For a paid production setup, see &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the GPT-5.5 API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing the integration in Apidog
&lt;/h2&gt;

&lt;p&gt;Puter calls run in the browser, so you cannot test them like a normal backend API request. A practical setup is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a static HTML page that loads Puter.js.&lt;/li&gt;
&lt;li&gt;Accept the prompt from a query parameter.&lt;/li&gt;
&lt;li&gt;Use the page as your &lt;code&gt;puter-prototype&lt;/code&gt; test target.&lt;/li&gt;
&lt;li&gt;Create a separate &lt;code&gt;openai-prod&lt;/code&gt; environment for the official OpenAI API.&lt;/li&gt;
&lt;li&gt;Keep both environments in the same Apidog collection for migration planning.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example local Puter test page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;body&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;pre&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"output"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Loading...&lt;span class="nt"&gt;&amp;lt;/pre&amp;gt;&lt;/span&gt;

  &lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;script&amp;gt;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URLSearchParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;search&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;prompt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Say hello&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;#output&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5.5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx serve &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then call it in the browser:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:3000?prompt=Explain%20JWT%20in%20one%20paragraph
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use Apidog to model the official OpenAI request you may migrate to later.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F05%2Fimage-29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F05%2Fimage-29.png" alt="" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Download Apidog&lt;/a&gt; and create two environments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;puter-prototype&lt;/code&gt;: your localhost page that runs Puter.js&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;openai-prod&lt;/code&gt;: &lt;code&gt;https://api.openai.com/v1&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For broader API testing patterns, see &lt;a href="http://apidog.com/blog/api-testing-tool-qa-engineers?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;API testing tool for QA engineers&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is Puter unlimited for developers?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. The developer does not pay for usage through their own OpenAI account. Usage is charged to the signed-in user’s Puter balance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need an OpenAI account?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No. Puter handles the OpenAI relationship. Your app does not need an OpenAI API key.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use this in production?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, for browser-based apps. The key product question is whether your users are willing to sign in to Puter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does GPT-5.5 through Puter behave the same as the official API?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model output should come from the same OpenAI model because Puter calls OpenAI on the user’s behalf. Latency may differ because there is an extra layer between your app and OpenAI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Puter support prompt caching?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Puter does not expose OpenAI prompt caching pricing controls today. If prompt caching is important for your workload, use the official OpenAI API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use Puter from a backend service?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not cleanly. Puter is browser-first and assumes a user session. Backend services should use the official OpenAI API.&lt;/p&gt;

&lt;p&gt;For free server-side options, see &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the GPT-5.5 API for free&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What model should I start with?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;gpt-5.5&lt;/code&gt; for general chat and reasoning&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;gpt-5.4-nano&lt;/code&gt; for high-volume classification&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;gpt-5.5-pro&lt;/code&gt; for harder reasoning&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;o3&lt;/code&gt; for long reasoning chains&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Will users be charged a lot?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most chat-style usage costs cents per session at OpenAI-style rates. Image generation is usually more expensive than text. Use &lt;code&gt;max_tokens&lt;/code&gt;, avoid unnecessary regeneration, and make cost-producing actions explicit in the UI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I generate images with Puter?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. Use &lt;code&gt;puter.ai.txt2img()&lt;/code&gt; with &lt;code&gt;gpt-image-2&lt;/code&gt; or DALL-E models. The user pays from their Puter balance.&lt;/p&gt;

&lt;p&gt;For the official paid API guide, see &lt;a href="http://apidog.com/blog/how-to-use-gpt-image-2-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the GPT-Image-2 API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Puter.js is a practical way to add GPT-5.5, image generation, vision, function calling, streaming, and TTS to browser-based apps without managing an OpenAI key or paying for user traffic yourself.&lt;/p&gt;

&lt;p&gt;Use Puter for prototypes, hackathon builds, static sites, browser extensions, and free public apps. Use the official OpenAI API for backend workloads, compliance requirements, prompt caching, the Responses API, Files API, or strict structured outputs.&lt;/p&gt;

&lt;p&gt;Build and compare your requests in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;, test the migration path, and choose the integration model that fits your app.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Get Free Unlimited Claude Opus 4.7 API</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Sat, 09 May 2026 02:27:58 +0000</pubDate>
      <link>https://forem.com/hassann/get-free-unlimited-claude-opus-47-api-18jc</link>
      <guid>https://forem.com/hassann/get-free-unlimited-claude-opus-47-api-18jc</guid>
      <description>&lt;p&gt;Anthropic’s Claude models are strong choices for coding, agentic workflows, and long-context reasoning, but the official API cost can block small projects fast. Puter.js changes the billing model: you call Claude from the browser without an Anthropic API key, and usage is billed to the signed-in Puter user instead of your developer account.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide shows how to wire Claude into a browser app with Puter.js, choose a model, stream responses, maintain chat state, and understand when you should switch to the official Anthropic API.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Puter.js&lt;/strong&gt; lets browser apps call Claude without an Anthropic API key, backend, or developer-side billing.&lt;/li&gt;
&lt;li&gt;The end user signs in to Puter and covers their own usage.&lt;/li&gt;
&lt;li&gt;Supported models include &lt;strong&gt;Opus 4.7, Opus 4.6, Opus 4.6 Fast, Opus 4.5, Opus 4.1, Opus 4, Sonnet 4.6, Sonnet 4.5, Sonnet 4, and Haiku 4.5&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Add one &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tag, then call &lt;code&gt;puter.ai.chat()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Streaming, system prompts, and multi-turn conversations are supported.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to benchmark prompts against the official Anthropic API when you plan a migration.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How the Puter billing model works
&lt;/h2&gt;

&lt;p&gt;With the official Anthropic API, you usually do this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create an Anthropic account.&lt;/li&gt;
&lt;li&gt;Store an API key.&lt;/li&gt;
&lt;li&gt;Proxy requests through your backend.&lt;/li&gt;
&lt;li&gt;Pay for every user’s tokens.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With Puter.js, the flow changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your frontend loads Puter.js.&lt;/li&gt;
&lt;li&gt;The user signs in to Puter.&lt;/li&gt;
&lt;li&gt;Your app calls &lt;code&gt;puter.ai.chat()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Usage is charged to the user’s Puter account.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For you as the developer, that means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No API key in your repo&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No Anthropic billing account required&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No backend required for basic browser apps&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No shared usage cap across your whole user base&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The main constraint: Puter.js is browser-first. If you need cron jobs, backend workers, Discord bots, batch processing, or server-side API routes, use the official Anthropic API instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Add Puter.js
&lt;/h2&gt;

&lt;p&gt;For a static page or quick prototype, use the CDN script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A minimal HTML file looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;html&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;body&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are building with Vite, Webpack, or another bundler, install the package instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @heyputer/puter.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then import it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@heyputer/puter.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use the CDN for the fastest setup. Use the npm package when you want bundling, TypeScript support, or a production frontend build.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Choose a Claude model
&lt;/h2&gt;

&lt;p&gt;Puter exposes Claude models using Anthropic-style model IDs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model ID&lt;/th&gt;
&lt;th&gt;When to use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-7&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Latest flagship; deepest reasoning and complex agentic work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Prior flagship; strong coding and reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4.6-fast&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Lower-latency Opus variant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-5&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stable choice for production agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Legacy stable option&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Original Opus 4 baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-sonnet-4-6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Default daily driver for most apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-sonnet-4-5&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Prior Sonnet version; still useful for general tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-sonnet-4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sonnet 4 baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-haiku-4-5&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fast option for classification and high-volume simple tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Practical defaults:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;claude-sonnet-4-6&lt;/code&gt; for most app features.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;claude-haiku-4-5&lt;/code&gt; for fast classification, tagging, routing, or lightweight summaries.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;claude-opus-4-7&lt;/code&gt; for complex code review, multi-step planning, and long-form reasoning.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 3: Make your first Claude call
&lt;/h2&gt;

&lt;p&gt;Here is the smallest working example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;html&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;body&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;script&amp;gt;&lt;/span&gt;
      &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain quantum computing in simple terms&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open the file in a browser. Puter handles the call and prompts the user to sign in or create a Puter account if needed.&lt;/p&gt;

&lt;p&gt;The response shape mirrors Anthropic’s message format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For simple text responses, read the first content block. For more complex responses, iterate over all blocks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;block&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Stream long responses
&lt;/h2&gt;

&lt;p&gt;For essays, code generation, and chat UIs, stream the response instead of waiting for the full answer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Write a detailed essay on the impact of artificial intelligence on society&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;part&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a real chat UI, append each streamed chunk to the current message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;#assistant-message&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Generate a checklist for securing an Express.js API&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;part&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example HTML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"assistant-message"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Build a multi-turn conversation
&lt;/h2&gt;

&lt;p&gt;For chat, pass an array of messages instead of a single string.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;I am building a Next.js app with Postgres.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Got it. What do you need help with?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;How should I structure the migrations folder?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To keep the conversation going, store the transcript and append each new turn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;sendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userText&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;assistantText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;assistantText&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;assistantText&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude reads the full message array on each call, so keep the transcript trimmed if your app has very long conversations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Add a system prompt
&lt;/h2&gt;

&lt;p&gt;Use a system message to define behavior, tone, constraints, and output format.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a senior backend engineer. Reply in numbered bullets, never more than five.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;How do I prevent SQL injection in a Node app?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good system prompts are specific:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
You are a TypeScript code reviewer.
Focus on correctness, security, and maintainability.
Return:
1. Critical issues
2. Suggested improvements
3. A corrected code snippet when useful
Keep the answer concise.
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then pass it at the top of the message list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Review this function: ...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 7: Compare models with the same prompt
&lt;/h2&gt;

&lt;p&gt;The fastest way to pick a model is to run the same prompt across multiple Claude variants.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-haiku-4-5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Refactor this React component to use hooks: ...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;performance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;performance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;elapsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;ms`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;---&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will usually see this pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Haiku&lt;/strong&gt;: fastest; best for simple and high-volume tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sonnet&lt;/strong&gt;: best default for most app features.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Opus&lt;/strong&gt;: strongest for difficult prompts, deeper reasoning, and complex code tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To benchmark Puter’s browser path against the official Anthropic API in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;, keep both providers in the same collection and switch environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you get with Puter.js
&lt;/h2&gt;

&lt;p&gt;Puter.js gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude model access from the browser&lt;/li&gt;
&lt;li&gt;Multi-turn conversations&lt;/li&gt;
&lt;li&gt;System prompts&lt;/li&gt;
&lt;li&gt;Streaming responses&lt;/li&gt;
&lt;li&gt;No developer-side API key&lt;/li&gt;
&lt;li&gt;No developer-side Anthropic billing&lt;/li&gt;
&lt;li&gt;Browser-first production deployment path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Depending on the current Puter version, you may not get every official Anthropic API feature, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Native tool use / function calling&lt;/li&gt;
&lt;li&gt;Vision input&lt;/li&gt;
&lt;li&gt;Anthropic prompt caching controls&lt;/li&gt;
&lt;li&gt;Server-side execution without a browser user session&lt;/li&gt;
&lt;li&gt;Direct Anthropic rate-limit headers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For deeper tool workflows, the official Anthropic API or &lt;a href="http://apidog.com/blog/mcp-server-testing-apidog?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;MCP server testing in Apidog&lt;/a&gt; gives you more control.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use Puter vs the official Anthropic API
&lt;/h2&gt;

&lt;p&gt;Use Puter when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are building a browser-based app.&lt;/li&gt;
&lt;li&gt;You do not want to manage an Anthropic API key.&lt;/li&gt;
&lt;li&gt;You are shipping a free public tool and want to avoid developer-side billing exposure.&lt;/li&gt;
&lt;li&gt;You are prototyping before committing to official API usage.&lt;/li&gt;
&lt;li&gt;Your users can sign in to Puter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the official Anthropic API when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need backend calls.&lt;/li&gt;
&lt;li&gt;You need cron jobs, workers, or batch processing.&lt;/li&gt;
&lt;li&gt;You need prompt caching controls.&lt;/li&gt;
&lt;li&gt;You need advanced tool use, vision input, or Files API support.&lt;/li&gt;
&lt;li&gt;You need compliance, contracts, or regional requirements.&lt;/li&gt;
&lt;li&gt;Your users will not accept a Puter sign-in flow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A common path is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prototype in the browser with Puter.&lt;/li&gt;
&lt;li&gt;Validate prompts and UX.&lt;/li&gt;
&lt;li&gt;Benchmark model behavior.&lt;/li&gt;
&lt;li&gt;Migrate to the official Anthropic API when you need backend control.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The message shape is similar, so the migration is manageable.&lt;/p&gt;

&lt;p&gt;For the GPT equivalent, see &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the GPT-5.5 API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing the integration in Apidog
&lt;/h2&gt;

&lt;p&gt;Puter calls run in the browser, so you usually do not test them like backend API requests. A practical workflow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a small static page that loads Puter.js.&lt;/li&gt;
&lt;li&gt;Accept the prompt through a query parameter or form input.&lt;/li&gt;
&lt;li&gt;Use that page for browser-based Puter testing.&lt;/li&gt;
&lt;li&gt;Use Apidog to test the official Anthropic API surface.&lt;/li&gt;
&lt;li&gt;Keep both paths documented in the same project so migration is easier later.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example static test page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;html&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;body&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;pre&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"output"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/pre&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;script&amp;gt;&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URLSearchParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;search&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;prompt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Say hello from Claude.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;#output&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;

      &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run it locally and test prompts like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:5173/?prompt=Explain%20JWT%20authentication
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12lw6kx5lwlbpc2qapvu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12lw6kx5lwlbpc2qapvu.png" alt="Apidog testing setup" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Download Apidog&lt;/a&gt; and create two environments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;puter-prototype&lt;/code&gt;: your local static page that uses Puter.js&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;anthropic-prod&lt;/code&gt;: &lt;code&gt;https://api.anthropic.com/v1&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This lets you keep prompt tests, request examples, and migration notes in one place.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is this truly unlimited?
&lt;/h3&gt;

&lt;p&gt;Unlimited from the developer side, yes. The end user has whatever balance is available in their Puter account. New Puter accounts include starter credit, and users can top up if they need more.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need an Anthropic account?
&lt;/h3&gt;

&lt;p&gt;No. Puter handles the Anthropic relationship. Your app does not need an Anthropic API key.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use this in production?
&lt;/h3&gt;

&lt;p&gt;Yes, for browser-based apps. The key product decision is whether your users are willing to sign in to Puter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Claude through Puter behave the same as the official API?
&lt;/h3&gt;

&lt;p&gt;The model output is expected to be the same because Puter calls Anthropic on the user’s behalf. Latency may be slightly different because Puter adds an extra layer between your app and Anthropic.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about prompt caching?
&lt;/h3&gt;

&lt;p&gt;Puter does not expose Anthropic’s prompt caching pricing controls today. If you rely on prompt caching for large stable prompts, use the official Anthropic API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use Puter for a Discord bot or backend service?
&lt;/h3&gt;

&lt;p&gt;Not cleanly. Puter is browser-first and assumes a logged-in user session. For backend services, use the official Anthropic API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which model should I default to?
&lt;/h3&gt;

&lt;p&gt;Use &lt;code&gt;claude-sonnet-4-6&lt;/code&gt; by default. Move to &lt;code&gt;claude-opus-4-7&lt;/code&gt; for harder reasoning tasks and &lt;code&gt;claude-haiku-4-5&lt;/code&gt; for fast, high-volume classification.&lt;/p&gt;

&lt;h3&gt;
  
  
  Will users be charged a lot?
&lt;/h3&gt;

&lt;p&gt;Most chat-style usage costs cents per session at Anthropic-style rates. Casual users can run many conversations on starter credit before they need to top up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Puter.js is a practical way to add Claude to a browser app without managing Anthropic keys, billing, or backend infrastructure. Add the script, choose a model, call &lt;code&gt;puter.ai.chat()&lt;/code&gt;, and let the signed-in user cover their own usage.&lt;/p&gt;

&lt;p&gt;Use Puter for prototypes, hackathon projects, static sites, browser extensions, and free public apps. Use the official Anthropic API when you need backend execution, prompt caching, compliance controls, or advanced API features.&lt;/p&gt;

&lt;p&gt;Build and benchmark your requests in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;, compare Puter with the official API, and choose the path that matches your deployment model.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to Use Grok 4.3 for Free: 4 Working Paths in 2026</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Sat, 09 May 2026 02:24:07 +0000</pubDate>
      <link>https://forem.com/hassann/how-to-use-grok-43-for-free-4-working-paths-in-2026-3g79</link>
      <guid>https://forem.com/hassann/how-to-use-grok-43-for-free-4-working-paths-in-2026-3g79</guid>
      <description>&lt;p&gt;Grok 4.3 is xAI’s flagship model as of May 2026. It supports a 1M-token context window, native video input, and pricing of $1.25 / $2.50 per million tokens. If you are prototyping, learning, or building a side project, you can use Grok 4.3 without paying upfront through three practical routes: xAI Console promotional credits, Puter.js user-paid calls, and the free chat surfaces on grok.com and X.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide walks through each option with setup steps, code, and trade-offs. For the full paid API guide, see &lt;a href="http://apidog.com/blog/how-to-use-grok-4-3-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the Grok 4.3 API&lt;/a&gt;. For the voice equivalent, see &lt;a href="http://apidog.com/blog/how-to-use-grok-voice-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use Grok Voice for free&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Three free paths to Grok 4.3:&lt;/strong&gt; xAI Console promotional credits, Puter.js, and the chat UIs at &lt;a href="http://grok.com" rel="noopener noreferrer"&gt;grok.com&lt;/a&gt; and X.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for developers shipping a public web app:&lt;/strong&gt; Puter.js. Your users cover their own usage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for backend/API prototyping:&lt;/strong&gt; xAI Console credits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for no-code use:&lt;/strong&gt; &lt;a href="http://grok.com" rel="noopener noreferrer"&gt;grok.com&lt;/a&gt; or the X app.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model IDs:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;xAI direct API: &lt;code&gt;grok-4.3&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Puter.js: &lt;code&gt;x-ai/grok-4.3&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Use &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to test equivalent requests across providers and compare responses, latency, and token usage.&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Path 1: Use xAI Console promotional credits
&lt;/h2&gt;

&lt;p&gt;Use this path when you want to test the real production API surface without paying upfront.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Create an xAI Console account
&lt;/h3&gt;

&lt;p&gt;Go to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;console.x.ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sign in with your X account. Account verification follows whatever X requires.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Check your promotional credits
&lt;/h3&gt;

&lt;p&gt;After signup, open the &lt;strong&gt;Billing&lt;/strong&gt; tab.&lt;/p&gt;

&lt;p&gt;xAI has run promotional windows that give new accounts free credits. These credits are usually enough for several days of integration testing, but the amount and eligibility window can change.&lt;/p&gt;

&lt;p&gt;The key point: these credits are finite and do not auto-renew. Use them to validate your integration, then either move to paid usage or switch to another path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Call Grok 4.3 from the API
&lt;/h3&gt;

&lt;p&gt;The xAI endpoint follows an OpenAI-compatible Chat Completions shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;XAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"xai-..."&lt;/span&gt;

curl https://api.x.ai/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$XAI_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "grok-4.3",
    "messages": [
      {
        "role": "user",
        "content": "Explain prompt caching in three sentences."
      }
    ],
    "reasoning_effort": "low"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For early testing, start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"reasoning_effort"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"low"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;medium&lt;/code&gt; or &lt;code&gt;high&lt;/code&gt; only when you need stronger reasoning, because they consume credits faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pros and cons
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Real production API behavior&lt;/td&gt;
&lt;td&gt;Credit pool is finite&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supports Grok 4.3 capabilities such as 1M context, video, and function calling&lt;/td&gt;
&lt;td&gt;Promotional terms can change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No migration cost when moving to paid usage&lt;/td&gt;
&lt;td&gt;Limited to what fits inside the credit bucket&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Use this path if:&lt;/strong&gt; you need backend access, want to test the real API, or plan to move to paid xAI usage later.&lt;/p&gt;

&lt;p&gt;For the full request schema, see &lt;a href="http://apidog.com/blog/how-to-use-grok-4-3-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the Grok 4.3 API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 2: Use Puter.js
&lt;/h2&gt;

&lt;p&gt;Puter.js is the cleanest free path for developers building public browser-based apps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F48utw9k2o2vpe8zkl3kx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F48utw9k2o2vpe8zkl3kx.png" alt="Puter.js Grok example" width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How Puter.js works
&lt;/h3&gt;

&lt;p&gt;Puter.js exposes a JavaScript client for calling LLMs such as Grok, GPT, Claude, Gemini, and DeepSeek.&lt;/p&gt;

&lt;p&gt;The important billing difference:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The end user pays from their Puter account, not the developer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You add the script and call the model from the browser. When users run the app, Puter handles authentication and charges the user for the cloud and AI usage their session consumes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Add Puter.js to your page
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No API key is required in your app.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Call Grok 4.3
&lt;/h3&gt;

&lt;p&gt;Use &lt;code&gt;puter.ai.chat()&lt;/code&gt; with the Puter model ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://js.puter.com/v2/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;script&amp;gt;&lt;/span&gt;
  &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Summarize the trade-offs between SQLite and Postgres in three bullets.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x-ai/grok-4.3&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first time a user runs this, Puter prompts them to sign in or create a Puter account. After that, requests use the user’s Puter balance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Stream responses
&lt;/h3&gt;

&lt;p&gt;Puter.js also supports streaming:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Walk me through migrating a React app to Next.js.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x-ai/grok-4.3&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;reasoning_effort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;medium&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pros and cons
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Developer pays $0&lt;/td&gt;
&lt;td&gt;User must sign in to Puter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No API key in your repo&lt;/td&gt;
&lt;td&gt;Less suitable for backend-only systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supports multiple major LLM providers&lt;/td&gt;
&lt;td&gt;Requires a browser context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Good fit for public tools and side projects&lt;/td&gt;
&lt;td&gt;May add slightly more latency than direct xAI calls&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Use this path if:&lt;/strong&gt; you are building a public web app, demo, side project, or free tool where users can cover their own usage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Avoid this path if:&lt;/strong&gt; the user is not the person triggering the query, such as in internal automation, backend jobs, or bots.&lt;/p&gt;

&lt;p&gt;For similar free-access patterns, see &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the DeepSeek V4 API for free&lt;/a&gt; and &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the GPT-5.5 API for free&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 3: Use grok.com or the X app
&lt;/h2&gt;

&lt;p&gt;Use this path when you only need to chat with Grok and do not need API access.&lt;/p&gt;

&lt;p&gt;Options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="http://grok.com" rel="noopener noreferrer"&gt;grok.com&lt;/a&gt;:&lt;/strong&gt; web chat. Sign in with X.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;X app:&lt;/strong&gt; Grok is available inside the X mobile and web apps under the Grok tab.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Free users get a limited daily quota that resets every 24 hours.&lt;/p&gt;

&lt;p&gt;This path is useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One-off research.&lt;/li&gt;
&lt;li&gt;Prompt exploration.&lt;/li&gt;
&lt;li&gt;Checking whether Grok’s output style fits your use case.&lt;/li&gt;
&lt;li&gt;Manual testing before implementing API calls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You cannot script or automate requests from these chat UIs.&lt;/p&gt;

&lt;p&gt;The free tier on &lt;a href="http://grok.com" rel="noopener noreferrer"&gt;grok.com&lt;/a&gt; currently defaults to a smaller Grok variant. Premium subscriptions on X unlock Grok 4.3 in the chat UI with higher quotas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 4: Use OpenRouter for cheaper Grok-class testing
&lt;/h2&gt;

&lt;p&gt;OpenRouter is not a free Grok 4.3 path, but it is useful for testing Grok-class models behind one API key.&lt;/p&gt;

&lt;p&gt;Grok 4.3 on OpenRouter costs the same as direct xAI pricing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$1.25 / $2.50 per 1M tokens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, OpenRouter also carries free variants for some Grok models, such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;grok-4-fast:free
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use this when you do not specifically need Grok 4.3 but want a free Grok-family model for experimentation.&lt;/p&gt;

&lt;p&gt;Example request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://openrouter.ai/api/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "x-ai/grok-4-fast:free",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use this path if:&lt;/strong&gt; you want free Grok-class output and do not require Grok 4.3 specifically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compare the options
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;Cost to developer&lt;/th&gt;
&lt;th&gt;Cost to end user&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;xAI Console credits&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0 within credit limit&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;Backend prototyping and learning the production API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Puter.js&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;User pays usage&lt;/td&gt;
&lt;td&gt;Public web apps, side projects, free tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="http://grok.com" rel="noopener noreferrer"&gt;grok.com&lt;/a&gt; / X&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$0 within quota&lt;/td&gt;
&lt;td&gt;Manual use and prompt testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenRouter free model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;Free Grok-class output, not Grok 4.3 specifically&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Test provider requests in Apidog
&lt;/h2&gt;

&lt;p&gt;When you are comparing providers, keep the prompt and request body stable. Change only the base URL, auth key, and model name.&lt;/p&gt;

&lt;p&gt;A practical Apidog setup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create an Apidog environment with these variables:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;XAI_API_KEY&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;OPENROUTER_API_KEY&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;BASE_URL&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Create one Chat Completions request.&lt;/li&gt;
&lt;li&gt;Save provider-specific variants:

&lt;ul&gt;
&lt;li&gt;xAI direct&lt;/li&gt;
&lt;li&gt;OpenRouter&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Run both with the same prompt.&lt;/li&gt;
&lt;li&gt;Compare:

&lt;ul&gt;
&lt;li&gt;Response quality&lt;/li&gt;
&lt;li&gt;Token usage&lt;/li&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;Error behavior&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;When credits run out, switch environments instead of changing code.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Download Apidog&lt;/a&gt; and create a new collection.&lt;/p&gt;

&lt;p&gt;Use these base URLs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://api.x.ai/v1
https://openrouter.ai/api/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both use an OpenAI-compatible Chat Completions schema, so the request body can stay mostly identical except for the &lt;code&gt;model&lt;/code&gt; value.&lt;/p&gt;

&lt;p&gt;For more on cross-provider testing, see &lt;a href="http://apidog.com/blog/api-testing-tool-qa-engineers?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;API testing tool for QA engineers&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you give up by staying free
&lt;/h2&gt;

&lt;p&gt;Free paths are useful for prototyping, but they come with trade-offs.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Tighter rate limits
&lt;/h3&gt;

&lt;p&gt;Promotional credits do not remove rate limits. If you test at scale, expect &lt;code&gt;429&lt;/code&gt; responses before your credit pool is exhausted.&lt;/p&gt;

&lt;p&gt;Add basic throttling during tests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sleep&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;prompts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callGrok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Less benefit from prompt caching
&lt;/h3&gt;

&lt;p&gt;Prompt caching is most valuable when you send large repeated context, such as a stable 50k+ token system prompt.&lt;/p&gt;

&lt;p&gt;For a small prototype with a few dozen calls, caching savings are less important.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Best-effort support
&lt;/h3&gt;

&lt;p&gt;Free usage paths usually do not include production support. If you are debugging production traffic, move to a paid tier.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to move to paid usage
&lt;/h2&gt;

&lt;p&gt;Move off the free path when you see one of these signals:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You hit rate limits often.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If rate limits block testing or usage more than a few times per week, paid usage is easier to operate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You have large reusable prompts.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Stable long prompts can benefit from prompt caching.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You need compliance or support.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Free tiers are not the right place for SOC 2 audit trails, BAAs, regional data residency, or production support requirements.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Migration is usually small:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For xAI Console, keep the same API surface and use paid billing.&lt;/li&gt;
&lt;li&gt;For OpenRouter, change the model or base URL.&lt;/li&gt;
&lt;li&gt;For Puter.js, keep the browser flow if user-paid usage still fits your product.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Grok 4.3 truly free?
&lt;/h3&gt;

&lt;p&gt;It depends on the path.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;xAI Console: free while promotional credits last.&lt;/li&gt;
&lt;li&gt;Puter.js: free for the developer because the user pays.&lt;/li&gt;
&lt;li&gt;
&lt;a href="http://grok.com" rel="noopener noreferrer"&gt;grok.com&lt;/a&gt;: free within a daily message quota.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Can I use Grok 4.3 from a backend without paying?
&lt;/h3&gt;

&lt;p&gt;Yes, while xAI Console credits last. After that, you need paid usage or a browser-based user-pays flow such as Puter.js.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Puter.js work in Node.js?
&lt;/h3&gt;

&lt;p&gt;Puter.js is browser-first. The user-pays model is built around browser authentication and user handoff. For backend usage, use the xAI Console path while credits last.&lt;/p&gt;

&lt;h3&gt;
  
  
  What model ID should I use on Puter.js?
&lt;/h3&gt;

&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x-ai/grok-4.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What model ID should I use with xAI directly?
&lt;/h3&gt;

&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;grok-4.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Do free credits cover function calling and video input?
&lt;/h3&gt;

&lt;p&gt;Yes. Console credits apply to Grok 4.3 usage, including function calling, long context, video input, and reasoning effort. Watch token usage closely because video and long context can consume credits quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does this compare to Grok Voice?
&lt;/h3&gt;

&lt;p&gt;Grok Voice has its own free-access pattern. For that walkthrough, see &lt;a href="http://apidog.com/blog/how-to-use-grok-voice-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use Grok Voice for free&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is there a free Grok 4.3 mini?
&lt;/h3&gt;

&lt;p&gt;Not currently. xAI has not released a separate mini SKU for the 4.3 line. The closest free substitute mentioned here is &lt;code&gt;grok-4-fast:free&lt;/code&gt; on OpenRouter, which is a smaller, faster Grok 4 variant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Use the path that matches your implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;xAI Console credits&lt;/strong&gt; if you need the real backend API.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Puter.js&lt;/strong&gt; if you are shipping a browser-based public app and want users to cover usage.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;&lt;a href="http://grok.com" rel="noopener noreferrer"&gt;grok.com&lt;/a&gt;&lt;/strong&gt; or &lt;strong&gt;X&lt;/strong&gt; if you only need manual chat.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;OpenRouter free Grok variants&lt;/strong&gt; if you want free Grok-class output but do not need Grok 4.3 specifically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If none of the free paths fit, Grok 4.3’s paid pricing is still low enough for many side projects.&lt;/p&gt;

&lt;p&gt;For the full paid API walkthrough, see &lt;a href="http://apidog.com/blog/how-to-use-grok-4-3-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the Grok 4.3 API&lt;/a&gt;. For the head-to-head against OpenAI, see &lt;a href="http://apidog.com/blog/grok-voice-vs-gpt-realtime-best-voice-model?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Grok Voice vs GPT-Realtime&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Build the request once in Apidog, swap the base URL between providers, and ship on the option that fits your usage curve.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft1j3kymdmi5tzgtqoo07.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft1j3kymdmi5tzgtqoo07.png" alt="Apidog API testing workflow" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to Use the Grok 4.3 API ?</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 08 May 2026 07:40:02 +0000</pubDate>
      <link>https://forem.com/hassann/how-to-use-the-grok-43-api--3c5c</link>
      <guid>https://forem.com/hassann/how-to-use-the-grok-43-api--3c5c</guid>
      <description>&lt;p&gt;xAI rolled out Grok 4.3 in stages: beta on April 17, 2026, API access on April 30, and full general availability on May 6. The release adds a 1,000,000-token context window, native video input, always-on reasoning, and roughly 40% lower pricing versus Grok 4.20. Eight legacy Grok models retire on May 15, so teams still using &lt;code&gt;grok-3&lt;/code&gt; or &lt;code&gt;grok-4&lt;/code&gt; models should migrate now.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide shows how to call Grok 4.3 from code: endpoint format, authentication, OpenAI-compatible SDK setup, &lt;code&gt;reasoning_effort&lt;/code&gt;, video input, function calling, and a repeatable test workflow in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For the voice side of the same release, see &lt;a href="http://apidog.com/blog/how-to-use-grok-voice-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use Grok Voice for free&lt;/a&gt;. For the head-to-head against OpenAI’s flagship voice model, see &lt;a href="http://apidog.com/blog/grok-voice-vs-gpt-realtime-best-voice-model?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Grok Voice vs GPT-Realtime&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Grok 4.3 went GA on &lt;strong&gt;May 6, 2026&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Eight legacy models retire on &lt;strong&gt;May 15, 2026&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Pricing:

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;$1.25 per 1M input tokens&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;$2.50 per 1M output tokens&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;$0.20 per 1M cached input tokens&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Context window: &lt;strong&gt;1,000,000 tokens&lt;/strong&gt;.&lt;/li&gt;

&lt;li&gt;New input type: native &lt;strong&gt;video input&lt;/strong&gt;.&lt;/li&gt;

&lt;li&gt;Reasoning is always on.&lt;/li&gt;

&lt;li&gt;
&lt;code&gt;reasoning_effort&lt;/code&gt; supports &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, and &lt;code&gt;high&lt;/code&gt;.&lt;/li&gt;

&lt;li&gt;Default reasoning effort is &lt;code&gt;medium&lt;/code&gt;.&lt;/li&gt;

&lt;li&gt;Endpoint: &lt;code&gt;&lt;a href="https://api.x.ai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.x.ai/v1/chat/completions&lt;/a&gt;&lt;/code&gt;.&lt;/li&gt;

&lt;li&gt;The API is OpenAI-compatible for Chat Completions.&lt;/li&gt;

&lt;li&gt;Standard-tier throughput is around 159 tokens/second.&lt;/li&gt;

&lt;li&gt;Intelligence Index: 53, according to Artificial Analysis.&lt;/li&gt;

&lt;li&gt;Use &lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to save request variants, compare reasoning settings, and replay the same test across providers.&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  What changed in Grok 4.3
&lt;/h2&gt;

&lt;p&gt;For most developer teams, the important changes are practical:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Lower token cost&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Input pricing is down 37.5% versus Grok 4.20. Output pricing is down 58.3%. Cached input is now $0.20 per 1M tokens, which matters if you reuse long system prompts or large static context.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;1M-token context window&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Grok 4.3 increases the context window from 256k to 1M tokens. That makes it usable for large prompts such as codebases, transcripts, long contracts, and multi-document workflows.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Native video input&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Grok 4.3 is the first Grok model with native video input. You can pass a video URL in the message content and ask the model to reason over the clip.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Always-on reasoning&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every request includes reasoning. The &lt;code&gt;reasoning_effort&lt;/code&gt; parameter controls depth, but the model does not run below &lt;code&gt;low&lt;/code&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Better agent workflows&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;xAI reports a +300 Elo gain on GDPval-AA versus Grok 4.20. In practice, this matters most for tool selection, multi-step workflows, and function-calling agents.&lt;/p&gt;

&lt;p&gt;Artificial Analysis gives Grok 4.3 an Intelligence Index of 53, above the average of 35 for its price tier, and ranks it tenth out of 146 tracked models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before sending your first request, prepare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;strong&gt;xAI Console account&lt;/strong&gt; at &lt;code&gt;console.x.ai&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;A billable tier with an API key&lt;/li&gt;
&lt;li&gt;A project-scoped API key for production use&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;OpenAI SDK&lt;/strong&gt; or the xAI SDK&lt;/li&gt;
&lt;li&gt;An API client for saving and replaying requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi98pls7kwujze1wffker.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi98pls7kwujze1wffker.png" alt="xAI Console screenshot" width="800" height="502"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Export your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;XAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"xai-..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are testing locally, use an environment file or shell variable. For production, store the key in your secret manager.&lt;/p&gt;

&lt;h2&gt;
  
  
  Endpoint and authentication
&lt;/h2&gt;

&lt;p&gt;Grok 4.3 uses the OpenAI-compatible Chat Completions API with xAI’s base URL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.x.ai/v1/chat/completions
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Required headers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Authorization: Bearer $XAI_API_KEY
Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the API is OpenAI-compatible, most existing OpenAI SDK code only needs two changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Change the API key.&lt;/li&gt;
&lt;li&gt;Change the &lt;code&gt;base_url&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Python example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.x.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grok-4.3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the trade-offs of GraphQL vs REST in three bullets.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;reasoning_effort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use the xAI SDK instead, the request shape is similar. The main difference is the client import and initialization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Request parameters
&lt;/h2&gt;

&lt;p&gt;Use these parameters for most Grok 4.3 Chat Completions requests:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Values&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;&lt;code&gt;grok-4.3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Required.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;messages&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;array&lt;/td&gt;
&lt;td&gt;OpenAI message shape&lt;/td&gt;
&lt;td&gt;Required. Supports &lt;code&gt;system&lt;/code&gt;, &lt;code&gt;user&lt;/code&gt;, and &lt;code&gt;assistant&lt;/code&gt; roles.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;reasoning_effort&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Optional. Default: &lt;code&gt;medium&lt;/code&gt;. Higher values can increase latency and output tokens.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;max_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;int&lt;/td&gt;
&lt;td&gt;&lt;code&gt;1–32768&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Caps output length.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;temperature&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;float&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.0–2.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Default: &lt;code&gt;1.0&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;top_p&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;float&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.0–1.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Nucleus sampling.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stream&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;bool&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;true&lt;/code&gt;, &lt;code&gt;false&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Enables server-sent events when &lt;code&gt;true&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tools&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;array&lt;/td&gt;
&lt;td&gt;OpenAI tool shape&lt;/td&gt;
&lt;td&gt;Used for function calling.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tool_choice&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string / object&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;auto&lt;/code&gt;, &lt;code&gt;none&lt;/code&gt;, or specific tool&lt;/td&gt;
&lt;td&gt;Uses standard OpenAI semantics.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;response_format&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ "type": "json_object" }&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Enables structured JSON output.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;seed&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;int&lt;/td&gt;
&lt;td&gt;any integer&lt;/td&gt;
&lt;td&gt;Useful for reproducibility with &lt;code&gt;temperature: 0&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Minimal curl request
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.x.ai/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$XAI_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "grok-4.3",
    "messages": [
      {
        "role": "system",
        "content": "You are a senior backend engineer."
      },
      {
        "role": "user",
        "content": "Review this query plan and flag the bottleneck."
      }
    ],
    "reasoning_effort": "high"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response uses the standard OpenAI-style shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"completion_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;456&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"reasoning_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;657&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read the final text from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Choosing a reasoning effort
&lt;/h2&gt;

&lt;p&gt;Grok 4.3 supports three reasoning levels.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use &lt;code&gt;low&lt;/code&gt; for fast, simple tasks
&lt;/h3&gt;

&lt;p&gt;Good fits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Classification&lt;/li&gt;
&lt;li&gt;Summarization&lt;/li&gt;
&lt;li&gt;Rule extraction&lt;/li&gt;
&lt;li&gt;Simple Q&amp;amp;A&lt;/li&gt;
&lt;li&gt;Lightweight routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grok-4.3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify this ticket as billing, bug, feature request, or account access: ...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;reasoning_effort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;low&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Use &lt;code&gt;medium&lt;/code&gt; for default production traffic
&lt;/h3&gt;

&lt;p&gt;Good fits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer support&lt;/li&gt;
&lt;li&gt;Single-step tool use&lt;/li&gt;
&lt;li&gt;Data analysis&lt;/li&gt;
&lt;li&gt;Normal code explanations&lt;/li&gt;
&lt;li&gt;Function calling
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grok-4.3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze this API error log and suggest the most likely root cause.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;reasoning_effort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Use &lt;code&gt;high&lt;/code&gt; for complex workflows
&lt;/h3&gt;

&lt;p&gt;Good fits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-step agents&lt;/li&gt;
&lt;li&gt;Long code review&lt;/li&gt;
&lt;li&gt;Complex math&lt;/li&gt;
&lt;li&gt;Planning-heavy tasks&lt;/li&gt;
&lt;li&gt;Debugging with many constraints
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grok-4.3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review this migration plan, identify risks, and produce a safer rollout sequence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;reasoning_effort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reasoning is always enabled. Setting &lt;code&gt;reasoning_effort&lt;/code&gt; to &lt;code&gt;low&lt;/code&gt; reduces depth, but it does not disable reasoning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Function calling
&lt;/h2&gt;

&lt;p&gt;Grok 4.3 supports the standard OpenAI function-calling shape.&lt;/p&gt;

&lt;p&gt;The flow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define tools.&lt;/li&gt;
&lt;li&gt;Send the user message and tool schema.&lt;/li&gt;
&lt;li&gt;Read &lt;code&gt;tool_calls&lt;/code&gt; from the assistant message.&lt;/li&gt;
&lt;li&gt;Execute the tool in your application.&lt;/li&gt;
&lt;li&gt;Send the tool result back with role &lt;code&gt;tool&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Ask the model to produce the final answer.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Define a tool
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lookup_user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Look up a user by ID.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Ask Grok 4.3 to call the tool
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grok-4.3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find user u_42 and tell me their last login.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;reasoning_effort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;
&lt;span class="n"&gt;tool_calls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Execute and return the tool result
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find user u_42 and tell me their last login.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lookup_user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Replace this with your real database/API call.
&lt;/span&gt;        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;u_42&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_login&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-05-06T14:22:00Z&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;final_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grok-4.3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;reasoning_effort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The GDPval-AA gain is especially relevant here: Grok 4.3 should be better at choosing tools, avoiding redundant calls, and recovering from tool errors.&lt;/p&gt;

&lt;p&gt;If you are testing tool workflows, &lt;a href="http://apidog.com/blog/mcp-server-testing-apidog?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;MCP server testing in Apidog&lt;/a&gt; covers a replay-based setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Video input
&lt;/h2&gt;

&lt;p&gt;Grok 4.3 is the first Grok model with native video input. Pass a video URL inside the message content array.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grok-4.3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe what happens in this clip and flag any anomalies.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com/clip.mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Video tokens count against input usage. If cost or latency matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trim the clip before sending.&lt;/li&gt;
&lt;li&gt;Downsample when full resolution is unnecessary.&lt;/li&gt;
&lt;li&gt;Avoid sending repeated static footage.&lt;/li&gt;
&lt;li&gt;Cache surrounding text context when possible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model reasons over frames natively, so you do not need to manually extract keyframes first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the 1M-token context window
&lt;/h2&gt;

&lt;p&gt;The 1M-token context window is useful when retrieval or chunking would remove important context.&lt;/p&gt;

&lt;p&gt;Common patterns:&lt;/p&gt;

&lt;h3&gt;
  
  
  Whole-codebase review
&lt;/h3&gt;

&lt;p&gt;Send:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The diff&lt;/li&gt;
&lt;li&gt;Touched files&lt;/li&gt;
&lt;li&gt;Related interfaces&lt;/li&gt;
&lt;li&gt;Test output&lt;/li&gt;
&lt;li&gt;Lint output&lt;/li&gt;
&lt;li&gt;Migration notes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prompt example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Review this change as a senior backend engineer.

Focus on:
1. Data loss risks
2. Transaction boundaries
3. Backward compatibility
4. Test gaps
5. Rollback strategy

Context:
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Long-document QA
&lt;/h3&gt;

&lt;p&gt;Use it for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Legal contracts&lt;/li&gt;
&lt;li&gt;Earnings calls&lt;/li&gt;
&lt;li&gt;Compliance policies&lt;/li&gt;
&lt;li&gt;Technical specifications&lt;/li&gt;
&lt;li&gt;Incident timelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prompt example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Answer only from the provided document.

Question:
Which clauses describe termination rights, and what notice period applies to each party?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Agent memory
&lt;/h3&gt;

&lt;p&gt;For agent workflows, you can keep long conversation history in context instead of summarizing aggressively. This is useful when prior details affect personalization or task continuity.&lt;/p&gt;

&lt;p&gt;Cached input pricing makes stable long context cheaper. For example, a 400k-token stable system prompt costs $0.08 per cached call at $0.20 per 1M cached tokens, instead of $0.50 at the fresh input rate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migrating from legacy Grok models
&lt;/h2&gt;

&lt;p&gt;Eight legacy Grok models retire on &lt;strong&gt;May 15, 2026, 12:00 PM PT&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For most apps, migration is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;- model="grok-4.20"
&lt;/span&gt;&lt;span class="gi"&gt;+ model="grok-4.3"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;- model="grok-3"
&lt;/span&gt;&lt;span class="gi"&gt;+ model="grok-4.3"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the request shape is compatible, most Chat Completions calls should continue working.&lt;/p&gt;

&lt;p&gt;Watch for two differences.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Reasoning behavior
&lt;/h3&gt;

&lt;p&gt;Some legacy models did not accept &lt;code&gt;reasoning_effort&lt;/code&gt;. Grok 4.3 always reasons.&lt;/p&gt;

&lt;p&gt;If your previous workflow depended on a very fast non-reasoning path, start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning_effort"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"low"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then measure latency and quality before moving to &lt;code&gt;medium&lt;/code&gt; or &lt;code&gt;high&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Output formatting
&lt;/h3&gt;

&lt;p&gt;Grok 4.3 tends to produce more structured output than Grok 4.20. If your application uses regex-based parsing, retest before switching production traffic.&lt;/p&gt;

&lt;p&gt;For broader model pricing context, see &lt;a href="http://apidog.com/blog/gpt-5-5-pricing?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 pricing&lt;/a&gt;. For reasoning-model usage patterns, see &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the GPT-5.5 API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Grok 4.3 in Apidog
&lt;/h2&gt;

&lt;p&gt;Use &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to create repeatable API tests before migrating production traffic.&lt;/p&gt;

&lt;p&gt;Recommended setup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create an Apidog environment.&lt;/li&gt;
&lt;li&gt;Add these variables:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;XAI_API_KEY = xai-...
BASE_URL = https://api.x.ai/v1
MODEL = grok-4.3
REASONING_EFFORT = medium
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Create a &lt;code&gt;POST&lt;/code&gt; request:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{{BASE_URL}}/chat/completions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Add headers:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Authorization: Bearer {{XAI_API_KEY}}
Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Add the request body:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{{MODEL}}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are a senior backend engineer."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Review this API design and identify the top three implementation risks."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning_effort"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{{REASONING_EFFORT}}"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Duplicate the request three times:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Grok 4.3 - low&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Grok 4.3 - medium&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Grok 4.3 - high&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Change only &lt;code&gt;REASONING_EFFORT&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Response quality&lt;/li&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;&lt;code&gt;usage.prompt_tokens&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;usage.completion_tokens&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;usage.reasoning_tokens&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Total cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To compare with another provider, duplicate the environment and change &lt;code&gt;BASE_URL&lt;/code&gt;, &lt;code&gt;MODEL&lt;/code&gt;, and the API key. Keep the same prompt and request body.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Download Apidog&lt;/a&gt; to run the comparison. For broader API testing strategy, see &lt;a href="http://apidog.com/blog/api-testing-tool-qa-engineers?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;API testing tool for QA engineers&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgg8ipzrioxyp35igwy9z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgg8ipzrioxyp35igwy9z.png" alt="Apidog API testing screenshot" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Rate limits
&lt;/h2&gt;

&lt;p&gt;xAI Console tier limits range from a few thousand requests per minute on Tier 1 to multi-hundred-thousand request limits on enterprise tiers. Exact numbers can change, so check your console dashboard.&lt;/p&gt;

&lt;p&gt;The advertised 159 tokens/second throughput is per-stream output speed, not total account throughput. Concurrent requests scale within your tier limits.&lt;/p&gt;

&lt;p&gt;If you exceed your limit, the API returns HTTP &lt;code&gt;429&lt;/code&gt; with a &lt;code&gt;retry-after&lt;/code&gt; header.&lt;/p&gt;

&lt;p&gt;Basic retry pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RateLimitError&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grok-4.3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize this incident report.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;reasoning_effort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;RateLimitError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;wait_seconds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait_seconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request failed after retries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In production, also add jitter and respect the &lt;code&gt;retry-after&lt;/code&gt; header when present.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Grok 4.3 OpenAI-compatible end to end?
&lt;/h3&gt;

&lt;p&gt;For Chat Completions, yes. You can use the OpenAI SDK, change &lt;code&gt;base_url&lt;/code&gt;, change &lt;code&gt;model&lt;/code&gt;, and keep the same request shape. Function calling, structured output, and streaming use the same semantics.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Grok 4.3 support the Responses API?
&lt;/h3&gt;

&lt;p&gt;The xAI surface is Chat Completions today. The Responses API is OpenAI-only.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the actual context limit?
&lt;/h3&gt;

&lt;p&gt;The context limit is 1,000,000 tokens. Long inputs still cost money, so use cached input when your prompt is stable.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does always-on reasoning affect latency?
&lt;/h3&gt;

&lt;p&gt;First-token latency is higher than non-reasoning models, but Grok 4.3 streams output at around 159 tokens/second. Use &lt;code&gt;low&lt;/code&gt; for simple paths and reserve &lt;code&gt;high&lt;/code&gt; for planning-heavy work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use Grok 4.3 with Grok Voice?
&lt;/h3&gt;

&lt;p&gt;Yes. The voice agent, &lt;code&gt;grok-voice-think-fast-1.0&lt;/code&gt;, calls Grok 4.3 under the hood when it reasons. You can also call Grok 4.3 directly from a custom voice loop built with TTS and STT components.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens to old Grok 3 or Grok 4 calls after May 15?
&lt;/h3&gt;

&lt;p&gt;They fail with HTTP &lt;code&gt;410&lt;/code&gt; because the model is retired. Migrate before the cutoff.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Grok 4.3 support image input?
&lt;/h3&gt;

&lt;p&gt;Yes. It supports image input alongside video input. Pass an image URL in a content block using the OpenAI-style message format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Grok 4.3 is a practical migration target if you need lower token costs, larger context, always-on reasoning, native video input, and OpenAI-compatible Chat Completions. For existing OpenAI SDK users, the migration is mostly a base URL and model-name change.&lt;/p&gt;

&lt;p&gt;The fastest validation path is to create three request variants in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;, test &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, and &lt;code&gt;high&lt;/code&gt; reasoning on your real prompts, then compare latency, quality, and token usage before moving production traffic.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Grok Voice vs GPT-Realtime: Which Is the Best Voice Model in 2026?</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 08 May 2026 07:33:41 +0000</pubDate>
      <link>https://forem.com/hassann/grok-voice-vs-gpt-realtime-which-is-the-best-voice-model-in-2026-2mc5</link>
      <guid>https://forem.com/hassann/grok-voice-vs-gpt-realtime-which-is-the-best-voice-model-in-2026-2mc5</guid>
      <description>&lt;p&gt;xAI shipped Grok Voice the same week OpenAI rolled out GPT-Realtime-2. If you are choosing a voice model in 2026, both are credible flagship options: speech-to-speech, reasoning-capable, WebSocket-based, tool-capable, and natural-sounding. The practical decision comes down to five implementation trade-offs: latency, price, voice catalog, reasoning depth, and whether you need SIP, image input, or voice cloning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide compares the models from a developer perspective: API surface, integration shape, cost model, and which model to pick for common voice-agent architectures.&lt;/p&gt;

&lt;p&gt;For standalone implementation guides, see &lt;a href="http://apidog.com/blog/gpt-realtime-2-api/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use GPT-Realtime-2&lt;/a&gt; and &lt;a href="http://apidog.com/blog/how-to-use-grok-voice-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use Grok Voice for free&lt;/a&gt;. To stress-test either model under load, &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; supports WebSocket sessions natively.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use Grok Voice (&lt;code&gt;grok-voice-think-fast-1.0&lt;/code&gt;)&lt;/strong&gt; when latency, low cost, voice variety, multilingual TTS, or voice cloning are the main requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use GPT-Realtime-2&lt;/strong&gt; when you need deeper reasoning, image input, native SIP, MCP tool execution, or a more mature production voice-agent stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grok Voice&lt;/strong&gt; reports under 1 second time-to-first-audio and ships 80+ preset voices across 28 TTS languages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-Realtime-2&lt;/strong&gt; provides GPT-5-class reasoning, five reasoning levels, 128k context, image input, SIP, and native MCP support.&lt;/li&gt;
&lt;li&gt;Paid GPT-Realtime-2 voice usage is metered at &lt;strong&gt;$32 / 1M audio input tokens&lt;/strong&gt; and &lt;strong&gt;$64 / 1M audio output tokens&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Grok Voice has no per-minute audio charge on the xAI Console; you pay for Grok 4.3 reasoning at &lt;strong&gt;$1.25 / 1M input tokens&lt;/strong&gt; and &lt;strong&gt;$2.50 / 1M output tokens&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Build a small test harness first, measure latency and cost with your own audio, then choose.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Capability comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Grok Voice (&lt;code&gt;grok-voice-think-fast-1.0&lt;/code&gt;)&lt;/th&gt;
&lt;th&gt;GPT-Realtime-2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Time to first audio&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;&amp;lt; 1 second&lt;/strong&gt;; xAI claims ~5x faster than nearest competitor&lt;/td&gt;
&lt;td&gt;Sub-second on &lt;code&gt;low&lt;/code&gt; reasoning; slower on &lt;code&gt;high&lt;/code&gt; / &lt;code&gt;xhigh&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning levels&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;low&lt;/code&gt; / &lt;code&gt;medium&lt;/code&gt; / &lt;code&gt;high&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;minimal&lt;/code&gt; / &lt;code&gt;low&lt;/code&gt; / &lt;code&gt;medium&lt;/code&gt; / &lt;code&gt;high&lt;/code&gt; / &lt;code&gt;xhigh&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Underlying intelligence&lt;/td&gt;
&lt;td&gt;Grok 4.3, Intelligence Index 53&lt;/td&gt;
&lt;td&gt;GPT-5-class&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;1,000,000 tokens via Grok 4.3&lt;/td&gt;
&lt;td&gt;128,000 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Preset voices&lt;/td&gt;
&lt;td&gt;80+; five named voice-agent personas: Eve, Ara, Rex, Sal, Leo&lt;/td&gt;
&lt;td&gt;10; Cedar, Marin, plus eight retuned legacy voices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Languages, TTS&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;Not officially counted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Languages, STT&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;Inherited from GPT-Realtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice cloning&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Yes&lt;/strong&gt;; Custom Voices, ~1-minute sample, &amp;lt;2-minute training&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image input&lt;/td&gt;
&lt;td&gt;No; text + audio only&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Yes&lt;/strong&gt;; photo and screenshot input&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remote MCP servers&lt;/td&gt;
&lt;td&gt;Tool use supported; native MCP not advertised&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Yes&lt;/strong&gt;; MCP tools executed by API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Native SIP / phone calling&lt;/td&gt;
&lt;td&gt;Bring your own SIP provider&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Yes&lt;/strong&gt;; &lt;code&gt;?call_id={call_id}&lt;/code&gt; endpoint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audio formats&lt;/td&gt;
&lt;td&gt;PCM16, MP3, μ-law&lt;/td&gt;
&lt;td&gt;PCM16, G.711 μ-law, A-law&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing model&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Free on console&lt;/strong&gt; for voice; pay Grok 4.3 reasoning only&lt;/td&gt;
&lt;td&gt;$32 / 1M audio input tokens, $64 / 1M audio output tokens, $4 / $24 per 1M text tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance&lt;/td&gt;
&lt;td&gt;SOC 2 Type II, HIPAA-eligible with BAA, GDPR&lt;/td&gt;
&lt;td&gt;SOC 2, GDPR through OpenAI Enterprise&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Latency: Grok Voice is the default for real-time UX
&lt;/h2&gt;

&lt;p&gt;xAI claims &lt;code&gt;grok-voice-think-fast-1.0&lt;/code&gt; is “nearly 5 times faster than the closest competitor.” Treat vendor multipliers carefully, but the practical direction is consistent: Grok Voice usually reaches time-to-first-audio comfortably under one second, while GPT-Realtime-2 often sits around the 800ms–1500ms range depending on reasoning level.&lt;/p&gt;

&lt;p&gt;For a voice agent, this matters more than most benchmark numbers. In a phone call or mobile assistant, 600ms can feel responsive; 1200ms can feel like the user is waiting on a bot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation rule:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If the user is speaking live and latency is the top UX metric,
start with Grok Voice.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use GPT-Realtime-2 when the extra latency buys you reasoning, image understanding, SIP, or MCP.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing: compare the billing shape, not just the headline rate
&lt;/h2&gt;

&lt;p&gt;The two products price different parts of the pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  GPT-Realtime-2 pricing shape
&lt;/h3&gt;

&lt;p&gt;GPT-Realtime-2 meters audio as tokens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Audio input: &lt;strong&gt;$32 / 1M tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Audio output: &lt;strong&gt;$64 / 1M tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Text input/output: &lt;strong&gt;$4 / $24 per 1M tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One second of audio is roughly 50 tokens. A 5-minute conversation with balanced turn-taking can use around 30,000 audio tokens, or roughly &lt;strong&gt;$1.50 in audio I/O&lt;/strong&gt;. Cached input can reduce stable prompt costs significantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grok Voice pricing shape
&lt;/h3&gt;

&lt;p&gt;Grok Voice has no per-minute or per-token charge on the xAI Console for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TTS&lt;/li&gt;
&lt;li&gt;STT&lt;/li&gt;
&lt;li&gt;Voice agent usage&lt;/li&gt;
&lt;li&gt;Custom Voices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You pay for Grok 4.3 reasoning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: &lt;strong&gt;$1.25 / 1M tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Output: &lt;strong&gt;$2.50 / 1M tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because reasoning tokens are usually far fewer than audio tokens for the same call, a similar 5-minute interaction can come in under &lt;strong&gt;$0.10&lt;/strong&gt;, depending on usage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation rule:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If you expect thousands of voice minutes per day,
benchmark Grok Voice first.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For high-stakes but lower-volume flows, such as regulated support or sales calls, the price gap may matter less than reasoning quality and integrations.&lt;/p&gt;

&lt;p&gt;For more pricing context, see &lt;a href="http://apidog.com/blog/how-to-use-grok-4-3-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the Grok 4.3 API&lt;/a&gt; and &lt;a href="http://apidog.com/blog/gpt-5-5-pricing?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 pricing&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reasoning depth: GPT-Realtime-2 is stronger for complex agents
&lt;/h2&gt;

&lt;p&gt;GPT-Realtime-2 is described by OpenAI as a GPT-5-class speech-to-speech model. It exposes five reasoning levels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minimal
low
medium
high
xhigh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you a useful production control: reduce latency for simple turns, increase reasoning for complex turns.&lt;/p&gt;

&lt;p&gt;Example routing logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;selectReasoningLevel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requiresToolChain&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hasAmbiguousIntent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requiresLongAnswer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;medium&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;low&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Grok Voice runs Grok 4.3 underneath. Grok 4.3 is strong, especially on agentic tasks, but based on the published benchmark framing, GPT-Realtime-2 is the safer choice for complex mid-conversation reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use GPT-Realtime-2 when the agent must:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Disambiguate unclear user intent&lt;/li&gt;
&lt;li&gt;Select between many tools&lt;/li&gt;
&lt;li&gt;Reason over long state&lt;/li&gt;
&lt;li&gt;Recover from interruptions&lt;/li&gt;
&lt;li&gt;Handle multi-step workflows&lt;/li&gt;
&lt;li&gt;Explain decisions out loud&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Grok Voice when the workflow is mostly scripted:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FAQ support&lt;/li&gt;
&lt;li&gt;Order status&lt;/li&gt;
&lt;li&gt;Appointment booking&lt;/li&gt;
&lt;li&gt;Simple sales qualification&lt;/li&gt;
&lt;li&gt;Consumer chat companions&lt;/li&gt;
&lt;li&gt;Low-latency mobile voice UX&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Voice catalog: Grok has more voices; OpenAI has tighter consistency
&lt;/h2&gt;

&lt;p&gt;Grok ships 80+ preset voices across 28 TTS languages. The voice-agent layer exposes five curated personas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Eve&lt;/li&gt;
&lt;li&gt;Ara&lt;/li&gt;
&lt;li&gt;Rex&lt;/li&gt;
&lt;li&gt;Sal&lt;/li&gt;
&lt;li&gt;Leo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The broader TTS surface gives you more variety, especially if you need a particular tone, accent, or brand fit.&lt;/p&gt;

&lt;p&gt;GPT-Realtime-2 ships 10 voices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cedar&lt;/li&gt;
&lt;li&gt;Marin&lt;/li&gt;
&lt;li&gt;alloy&lt;/li&gt;
&lt;li&gt;ash&lt;/li&gt;
&lt;li&gt;ballad&lt;/li&gt;
&lt;li&gt;coral&lt;/li&gt;
&lt;li&gt;echo&lt;/li&gt;
&lt;li&gt;sage&lt;/li&gt;
&lt;li&gt;shimmer&lt;/li&gt;
&lt;li&gt;verse&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The OpenAI catalog is smaller, but voice behavior is more consistent across the available options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation rule:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Need a specific voice or custom brand voice? Use Grok.
Need one reliable production voice? GPT-Realtime-2 is enough.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Voice cloning: only Grok Voice supports it
&lt;/h2&gt;

&lt;p&gt;Grok’s Custom Voices can clone a voice from about one minute of clean speech and return a &lt;code&gt;voice_id&lt;/code&gt; in under two minutes. The same &lt;code&gt;voice_id&lt;/code&gt; can be used across TTS and the voice-agent surface.&lt;/p&gt;

&lt;p&gt;OpenAI does not currently expose voice cloning through the Realtime API.&lt;/p&gt;

&lt;p&gt;If your product requires a cloned brand voice, character voice, or consented custom voice, this category is not close: choose Grok Voice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Image input: only GPT-Realtime-2 supports it
&lt;/h2&gt;

&lt;p&gt;GPT-Realtime-2 accepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text&lt;/li&gt;
&lt;li&gt;Audio&lt;/li&gt;
&lt;li&gt;Images&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means a user can send a screenshot or photo, then continue speaking with the agent about what is visible.&lt;/p&gt;

&lt;p&gt;This matters for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Field support&lt;/li&gt;
&lt;li&gt;Accessibility narration&lt;/li&gt;
&lt;li&gt;QA workflows&lt;/li&gt;
&lt;li&gt;Visual troubleshooting&lt;/li&gt;
&lt;li&gt;Voice-driven app support&lt;/li&gt;
&lt;li&gt;“Look at my screen and help me” workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Grok Voice does not currently match this. If the agent needs to see what the user sees, use GPT-Realtime-2.&lt;/p&gt;

&lt;p&gt;For more on OpenAI’s image stack, see &lt;a href="http://apidog.com/blog/how-to-use-gpt-image-2-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the GPT-Image-2 API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  SIP and phone integration: GPT-Realtime-2 is simpler
&lt;/h2&gt;

&lt;p&gt;OpenAI’s Realtime API has native SIP support. A SIP trunk can connect directly to OpenAI’s gateway, and inbound calls open a WebSocket session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://api.openai.com/v1/realtime?call_id={call_id}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That removes a bridge layer from your architecture.&lt;/p&gt;

&lt;p&gt;Grok Voice supports μ-law output for telephony, but you need to bring your own SIP provider, such as Twilio, Telnyx, or Plivo, and run the bridge yourself.&lt;/p&gt;

&lt;p&gt;A typical Grok telephony architecture looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Caller
  -&amp;gt; SIP provider
  -&amp;gt; Your media bridge
  -&amp;gt; Grok Voice WebSocket
  -&amp;gt; Your media bridge
  -&amp;gt; SIP provider
  -&amp;gt; Caller
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A typical GPT-Realtime-2 SIP architecture can be simpler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Caller
  -&amp;gt; SIP trunk
  -&amp;gt; OpenAI Realtime SIP endpoint
  -&amp;gt; GPT-Realtime-2 session
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Implementation rule:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If you are building call-center infrastructure and want fewer moving parts,
start with GPT-Realtime-2.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  MCP and tool use
&lt;/h2&gt;

&lt;p&gt;Both models support tool/function calling, but the integration level differs.&lt;/p&gt;

&lt;h3&gt;
  
  
  GPT-Realtime-2
&lt;/h3&gt;

&lt;p&gt;GPT-Realtime-2 supports remote MCP servers natively. You configure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MCP server URL&lt;/li&gt;
&lt;li&gt;Allowed tools&lt;/li&gt;
&lt;li&gt;Tool execution policy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then the Realtime API can execute MCP tools directly.&lt;/p&gt;

&lt;p&gt;That matters when your voice agent has a large tool catalog and you do not want every tool call to round-trip through your own function-call event loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grok Voice
&lt;/h3&gt;

&lt;p&gt;Grok Voice supports function calling and includes a built-in &lt;code&gt;web_search&lt;/code&gt; tool. Native MCP is not advertised as a first-class primitive yet.&lt;/p&gt;

&lt;p&gt;For small tool sets, this is fine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;lookup_order&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Look up an order by ID&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;order_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;create_support_ticket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Create a support ticket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customer_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;issue&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Implementation rule:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;5 or fewer tools: either model is fine.
50+ tools or MCP-first architecture: GPT-Realtime-2 is cleaner.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are testing MCP servers separately, see &lt;a href="http://apidog.com/blog/mcp-server-testing-apidog?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;MCP server testing in Apidog&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model selection by use case
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Recommended model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Consumer voice app, high volume, latency-critical&lt;/td&gt;
&lt;td&gt;Grok Voice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice cloning required&lt;/td&gt;
&lt;td&gt;Grok Voice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom brand voice&lt;/td&gt;
&lt;td&gt;Grok Voice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Character voices&lt;/td&gt;
&lt;td&gt;Grok Voice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multilingual TTS at scale, especially &amp;gt;10 languages&lt;/td&gt;
&lt;td&gt;Grok Voice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lowest-cost production voice agent&lt;/td&gt;
&lt;td&gt;Grok Voice on console&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice agent that needs screenshots or photos&lt;/td&gt;
&lt;td&gt;GPT-Realtime-2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Call-center deployment with SIP&lt;/td&gt;
&lt;td&gt;GPT-Realtime-2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-step reasoning agent&lt;/td&gt;
&lt;td&gt;GPT-Realtime-2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent with 50+ tools&lt;/td&gt;
&lt;td&gt;GPT-Realtime-2 with MCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Benchmark-heavy reasoning&lt;/td&gt;
&lt;td&gt;GPT-Realtime-2 with &lt;code&gt;xhigh&lt;/code&gt; reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long-context text reasoning&lt;/td&gt;
&lt;td&gt;Depends: GPT-Realtime-2 has 128k context; Grok 4.3 has 1M context&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How to test both before committing
&lt;/h2&gt;

&lt;p&gt;Do not choose from a spec sheet only. Build a small benchmark harness and measure both models using your own prompts, tools, audio, and target languages.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Create a fixture conversation
&lt;/h3&gt;

&lt;p&gt;Use a 10-turn script that represents your real product.&lt;/p&gt;

&lt;p&gt;Include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One simple answer&lt;/li&gt;
&lt;li&gt;One interruption&lt;/li&gt;
&lt;li&gt;One tool call&lt;/li&gt;
&lt;li&gt;One disambiguation&lt;/li&gt;
&lt;li&gt;One long-form answer&lt;/li&gt;
&lt;li&gt;One edge case&lt;/li&gt;
&lt;li&gt;Real user audio, not only synthetic text&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example fixture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"audio"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"case"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"initial_request"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"expected"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"asks_clarifying_question"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"audio"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"case"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"clarification"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"expected"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"calls_lookup_tool"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Configure both API keys
&lt;/h3&gt;

&lt;p&gt;Use environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;XAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Apidog, define both as environment variables so the same WebSocket test can run against either provider.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Use one WebSocket test shape
&lt;/h3&gt;

&lt;p&gt;Test Grok Voice with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://api.x.ai/v1/realtime?model=grok-voice-think-fast-1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test GPT-Realtime-2 with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://api.openai.com/v1/realtime?model=gpt-realtime-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep your test script as similar as possible across both runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Measure the metrics that affect production
&lt;/h3&gt;

&lt;p&gt;Capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time to first audio&lt;/li&gt;
&lt;li&gt;Total response latency&lt;/li&gt;
&lt;li&gt;Tool-call latency&lt;/li&gt;
&lt;li&gt;Number of failed or malformed tool calls&lt;/li&gt;
&lt;li&gt;Total input tokens&lt;/li&gt;
&lt;li&gt;Total output tokens&lt;/li&gt;
&lt;li&gt;Estimated cost per call&lt;/li&gt;
&lt;li&gt;User-perceived interruption handling&lt;/li&gt;
&lt;li&gt;Language-specific voice quality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simple result table is enough:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Grok Voice&lt;/th&gt;
&lt;th&gt;GPT-Realtime-2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Time to first audio&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total response latency&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool-call success rate&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per 5-minute call&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subjective voice score&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  5. Pick based on your measured bottleneck
&lt;/h3&gt;

&lt;p&gt;Use this decision logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;chooseVoiceModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requiresImageInput&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GPT-Realtime-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requiresNativeSIP&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GPT-Realtime-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requiresMCPAtScale&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GPT-Realtime-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requiresVoiceCloning&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Grok Voice&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;latencyIsPrimaryMetric&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Grok Voice&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;costIsPrimaryMetric&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Grok Voice&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reasoningFailuresAreCostly&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GPT-Realtime-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;realWorldBenchmarkWinner&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Download Apidog&lt;/a&gt; to run the side-by-side tests. The collection format is portable, so you can keep the benchmark artifact in version control.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can I use both models in the same app and route at runtime?
&lt;/h3&gt;

&lt;p&gt;Yes. Both use similar conversation shapes. You can route by intent, latency requirement, language, or workflow complexity.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;routeTurn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;includesImage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-realtime-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requiresComplexToolUse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-realtime-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requiresVoiceClone&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;grok-voice-think-fast-1.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;isCasualOrHighVolume&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;grok-voice-think-fast-1.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-realtime-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Which model has better non-English voice quality?
&lt;/h3&gt;

&lt;p&gt;Grok wins on language coverage: 80+ voices and 28 TTS languages. For languages both models support well, quality is close enough that you should test your exact language, accent, and domain vocabulary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is GPT-Realtime-2 worth the higher price?
&lt;/h3&gt;

&lt;p&gt;For simple FAQ-style support, usually no. For agents that need to read from a CRM, call multiple tools, resolve ambiguity, handle interruptions, and reason through edge cases, the reasoning and integration advantages can justify the cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does either model support cloning public figures?
&lt;/h3&gt;

&lt;p&gt;No. Both vendors restrict voice cloning to consented samples. Cloning a public figure without permission violates platform terms.&lt;/p&gt;

&lt;h3&gt;
  
  
  How hard is migration later?
&lt;/h3&gt;

&lt;p&gt;The event names and session configuration differ, but the conversation architecture is similar:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;connect
  -&amp;gt; configure session
  -&amp;gt; stream user audio
  -&amp;gt; receive assistant audio/events
  -&amp;gt; handle tool calls
  -&amp;gt; close session
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plan for a small port, mostly in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Session update payloads&lt;/li&gt;
&lt;li&gt;Event names&lt;/li&gt;
&lt;li&gt;Tool-call handlers&lt;/li&gt;
&lt;li&gt;Audio format handling&lt;/li&gt;
&lt;li&gt;Provider-specific authentication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you build and test with &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;, the request collection ports cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;There is no universal winner between Grok Voice and GPT-Realtime-2. There is a correct choice per architecture.&lt;/p&gt;

&lt;p&gt;Use &lt;strong&gt;Grok Voice&lt;/strong&gt; when your priorities are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lowest latency&lt;/li&gt;
&lt;li&gt;Lower cost at scale&lt;/li&gt;
&lt;li&gt;Larger voice catalog&lt;/li&gt;
&lt;li&gt;Multilingual TTS&lt;/li&gt;
&lt;li&gt;Voice cloning&lt;/li&gt;
&lt;li&gt;Consumer voice UX&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use &lt;strong&gt;GPT-Realtime-2&lt;/strong&gt; when your priorities are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deeper reasoning&lt;/li&gt;
&lt;li&gt;Image input&lt;/li&gt;
&lt;li&gt;Native SIP&lt;/li&gt;
&lt;li&gt;MCP tools&lt;/li&gt;
&lt;li&gt;Complex agent workflows&lt;/li&gt;
&lt;li&gt;Production call-center integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For everything else, build one benchmark harness in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;, run both models for a week, and choose based on your own latency, cost, and task-success data.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to Use Grok Voice for Free: Console Setup, Voice Cloning, and Real-Time Voice Agents</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 08 May 2026 07:29:18 +0000</pubDate>
      <link>https://forem.com/hassann/how-to-use-grok-voice-for-free-console-setup-voice-cloning-and-real-time-voice-agents-3fhn</link>
      <guid>https://forem.com/hassann/how-to-use-grok-voice-for-free-console-setup-voice-cloning-and-real-time-voice-agents-3fhn</guid>
      <description>&lt;p&gt;xAI shipped Grok Voice with the Grok 4.3 release. For developers, the key point is simple: Grok Voice is free on the xAI Console. There is no per-minute charge and no per-token charge for the voice agent model, text-to-speech, speech-to-text, or Custom Voices clone tool. The only billable resource is the underlying Grok 4.3 token usage when the agent reasons, and that usage has its own free console allowance for testing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide shows how to run Grok Voice at zero voice-feature cost: create a console key, clone a voice, open a WebSocket session, stream audio, add tool calls, and test the flow with &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; before wiring it into a product.&lt;/p&gt;

&lt;p&gt;If you also want the broader &lt;a href="http://apidog.com/blog/how-to-use-grok-4-3-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Grok 4.3 API guide&lt;/a&gt;, or a head-to-head against OpenAI’s stack in &lt;a href="http://apidog.com/blog/grok-voice-vs-gpt-realtime-best-voice-model?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Grok Voice vs GPT-Realtime&lt;/a&gt;, those companion posts cover the rest of the surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Grok Voice is free for users on the &lt;strong&gt;xAI Console&lt;/strong&gt; (&lt;code&gt;console.x.ai&lt;/code&gt;): no per-minute or per-token charge for TTS, STT, voice agent, or Custom Voices.&lt;/li&gt;
&lt;li&gt;Flagship model: &lt;code&gt;grok-voice-think-fast-1.0&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Time-to-first-audio is under &lt;strong&gt;1 second&lt;/strong&gt;; xAI claims it is roughly &lt;strong&gt;5x faster&lt;/strong&gt; than the closest competitor.&lt;/li&gt;
&lt;li&gt;80+ preset voices across &lt;strong&gt;28 languages&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;5 built-in voice agent personas: Eve, Ara, Rex, Sal, Leo.&lt;/li&gt;
&lt;li&gt;Custom voice cloning works from about &lt;strong&gt;1 minute of speech&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Production-ready voice generation completes in &lt;strong&gt;under 2 minutes&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;WebSocket endpoint:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://api.x.ai/v1/realtime?model=grok-voice-think-fast-1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;REST endpoints are available for TTS, STT, and Custom Voices.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to script WebSocket sessions and replay them without rerecording audio.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Grok Voice gives you for free
&lt;/h2&gt;

&lt;p&gt;The xAI Console is the path to free access. Sign in at &lt;code&gt;console.x.ai&lt;/code&gt;, generate an API key, and you can call four voice surfaces with no charge tied to the voice features themselves.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzjkeqwamuyap7ndjjrwg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzjkeqwamuyap7ndjjrwg.png" alt="Grok Voice in xAI Console" width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You get access to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Voice Agent&lt;/strong&gt;: real-time speech-to-speech with tool use, server-side voice activity detection, and turn-taking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text-to-Speech&lt;/strong&gt;: 80+ preset voices across 28 languages, with MP3 or μ-law output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speech-to-Text&lt;/strong&gt;: streaming and batch transcription across 25 input languages, with word-level timestamps and speaker diarization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom Voices&lt;/strong&gt;: clone your voice from a short sample and use the resulting &lt;code&gt;voice_id&lt;/code&gt; across TTS and voice agent APIs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only meter that ticks is Grok 4.3 token usage when the agent reasons over a request. The console also gives you free credit for testing that surface, which is enough to validate end-to-end flows before billing starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Get a console key
&lt;/h2&gt;

&lt;p&gt;Go to &lt;code&gt;console.x.ai&lt;/code&gt; and sign in with your X account.&lt;/p&gt;

&lt;p&gt;From the &lt;strong&gt;API Keys&lt;/strong&gt; page:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new API key.&lt;/li&gt;
&lt;li&gt;Enable the &lt;code&gt;voice&lt;/code&gt; and &lt;code&gt;chat&lt;/code&gt; scopes.&lt;/li&gt;
&lt;li&gt;Export the key once.&lt;/li&gt;
&lt;li&gt;Store it in your local environment.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;XAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"xai-..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For client-side apps, do &lt;strong&gt;not&lt;/strong&gt; ship the parent API key to the browser. Instead, mint an &lt;strong&gt;ephemeral token&lt;/strong&gt; from the console settings or via the &lt;code&gt;/v1/realtime/sessions&lt;/code&gt; endpoint.&lt;/p&gt;

&lt;p&gt;Ephemeral tokens carry the same scope but expire in minutes, so they are suitable for browser-based WebSocket sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Pick a voice
&lt;/h2&gt;

&lt;p&gt;You can start with a preset voice or create a custom clone.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option A: Use a preset voice
&lt;/h3&gt;

&lt;p&gt;The voice agent includes five named personas:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Voice&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Good fit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;eve&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Female, energetic&lt;/td&gt;
&lt;td&gt;Upbeat support flows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ara&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Female, warm&lt;/td&gt;
&lt;td&gt;General assistance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;rex&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Male, confident&lt;/td&gt;
&lt;td&gt;Sales scripts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sal&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Neutral, smooth&lt;/td&gt;
&lt;td&gt;Narration and longer reads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;leo&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Male, authoritative&lt;/td&gt;
&lt;td&gt;Compliance and formal flows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For the broader TTS API, the preset library is larger: more than 80 voices across 28 languages. You select them with the &lt;code&gt;voice&lt;/code&gt; parameter on the TTS endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B: Clone a custom voice
&lt;/h3&gt;

&lt;p&gt;Upload a WAV file with about one minute of clean speech from a single speaker.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.x.ai/v1/custom-voices &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$XAI_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"name=narrator-jane"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"language=en"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"audio=@sample.wav"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API returns a &lt;code&gt;voice_id&lt;/code&gt; in under two minutes. You can reuse that ID across both the TTS endpoint and the voice agent.&lt;/p&gt;

&lt;p&gt;Keep the reference clip clean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a quiet room.&lt;/li&gt;
&lt;li&gt;Record one speaker only.&lt;/li&gt;
&lt;li&gt;Avoid music, effects, or background noise.&lt;/li&gt;
&lt;li&gt;Prefer a consistent single take.&lt;/li&gt;
&lt;li&gt;Do not assume longer is better; the maximum reference clip length is 120 seconds, but clean audio matters more than duration.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 3: Make Grok talk over WebSocket
&lt;/h2&gt;

&lt;p&gt;The voice agent runs over a single WebSocket session:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the WebSocket.&lt;/li&gt;
&lt;li&gt;Send a &lt;code&gt;session.update&lt;/code&gt; event.&lt;/li&gt;
&lt;li&gt;Stream user audio into the socket.&lt;/li&gt;
&lt;li&gt;Receive audio deltas back from the model.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://api.x.ai/v1/realtime?model=grok-voice-think-fast-1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A minimal Node.js client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;WebSocket&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ws&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ws&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;wss://api.x.ai/v1/realtime?model=grok-voice-think-fast-1.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;XAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;open&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;session.update&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;voice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ara&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a friendly support agent. Keep replies under two sentences.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;input_audio_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pcm16&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;output_audio_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pcm16&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;turn_detection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;server_vad&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;message&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;response.audio.delta&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;response.audio.done&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Turn complete&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;User audio is sent with &lt;code&gt;input_audio_buffer.append&lt;/code&gt; events as base64-encoded PCM16 frames.&lt;/p&gt;

&lt;p&gt;The server responds with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;response.audio.delta&lt;/code&gt;: streamed audio chunks&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;response.audio.done&lt;/code&gt;: end of the current response turn&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PCM16 at 24 kHz is the safe default for browser and desktop apps. Use μ-law when bridging to phone systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Add tool use
&lt;/h2&gt;

&lt;p&gt;The voice agent supports function calling, so the model can call your APIs during a conversation.&lt;/p&gt;

&lt;p&gt;Declare tools in the session config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;session.update&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;function&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;lookup_order&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Look up the status of a customer order by order number.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;order_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the model wants to call your function, it emits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response.function_call_arguments.done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your app should then:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parse the function name and arguments.&lt;/li&gt;
&lt;li&gt;Run the function on your side.&lt;/li&gt;
&lt;li&gt;Send the result back with a &lt;code&gt;conversation.item.create&lt;/code&gt; event of type &lt;code&gt;function_call_output&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Let the model continue and narrate the result.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A built-in &lt;code&gt;web_search&lt;/code&gt; tool is also available, which is useful when you need fresh data without building a retrieval layer yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Use TTS without the voice agent
&lt;/h2&gt;

&lt;p&gt;If you only need text-to-speech for audio prompts, voiceovers, podcast intros, or static app audio, skip the WebSocket and call the REST endpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.x.ai/v1/tts &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$XAI_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "grok-tts-1",
    "voice": "ara",
    "input": "Welcome back to your account. Your last login was Tuesday at 3pm.",
    "format": "mp3"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; greeting.mp3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Supported output formats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;mp3&lt;/code&gt;: high-fidelity output&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mulaw&lt;/code&gt;: 8 kHz telephony output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The TTS endpoint is synchronous. You send text and receive audio bytes back; no streaming session is required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Test the whole flow in Apidog
&lt;/h2&gt;

&lt;p&gt;WebSocket APIs are harder to debug from the terminal because the conversation is stateful. A repeatable test setup helps you isolate changes in voice, instructions, tool calls, and audio frames.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi62iow0zbhkz10jerr2c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi62iow0zbhkz10jerr2c.png" alt="Testing Grok Voice WebSocket in Apidog" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A practical workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new WebSocket request in Apidog.&lt;/li&gt;
&lt;li&gt;Save the WebSocket URL:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://api.x.ai/v1/realtime?model=grok-voice-think-fast-1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Store your bearer token in an Apidog environment variable.&lt;/li&gt;
&lt;li&gt;Stage a script of JSON messages:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;session.update&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;input_audio_buffer.append&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;response.create&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Replay the script against one connection.&lt;/li&gt;
&lt;li&gt;Capture every server event into a tree.&lt;/li&gt;
&lt;li&gt;Diff two runs side by side when you change the voice or instructions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is useful for catching drift in turn-taking behavior before you ship.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Download Apidog&lt;/a&gt;, create a WebSocket request, and paste your &lt;code&gt;XAI_API_KEY&lt;/code&gt; under environment variables.&lt;/p&gt;

&lt;p&gt;The same collection can also hold your TTS and STT REST requests, so you can keep all Grok Voice surfaces in one project. For more on stateful API testing patterns, see &lt;a href="http://apidog.com/blog/api-testing-tool-qa-engineers?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;API testing tool for QA engineers&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free tier limits
&lt;/h2&gt;

&lt;p&gt;The console gives you full access without a per-minute or per-token charge for the voice features themselves. The main limits are operational:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rate limits&lt;/strong&gt;: the console enforces request-per-minute caps on each endpoint to prevent abuse. They are suitable for development and demos, not production traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom voice quota&lt;/strong&gt;: a single account can hold a finite number of custom voice clones at once. Delete unused clones to free slots.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning tokens&lt;/strong&gt;: when the voice agent uses Grok 4.3 reasoning under the hood, it bills against your console credit. Free credit is enough for prototyping; production requires a paid plan.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you hit rate-limit errors, batch your requests or move to a paid tier. The API behavior stays the same; only the cap changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compare voices before shipping
&lt;/h2&gt;

&lt;p&gt;Run the same script through every candidate voice before going live. Voices handle tone differently, and short tests catch poor pairings quickly.&lt;/p&gt;

&lt;p&gt;Use a small test set:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A two-sentence greeting.&lt;/li&gt;
&lt;li&gt;A confirmation phrase: “Got it, that’s all set.”&lt;/li&gt;
&lt;li&gt;A longer sentence with a number, a date, and a comma.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also test the same prompt at different tones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calm&lt;/li&gt;
&lt;li&gt;Normal&lt;/li&gt;
&lt;li&gt;Urgent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Grok’s preset voices handle tone shifts better than many TTS engines we have benchmarked, but you should still audit the actual output for your use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is the API actually free, or is there a hidden cap?
&lt;/h3&gt;

&lt;p&gt;The voice features — TTS, STT, voice agent, and Custom Voices — carry no per-minute or per-token charge on the console.&lt;/p&gt;

&lt;p&gt;The reasoning model under the hood bills against console credit. The console allowance is enough for prototyping.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need an X account?
&lt;/h3&gt;

&lt;p&gt;Yes. Console sign-in uses an X account.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use Grok Voice from a browser?
&lt;/h3&gt;

&lt;p&gt;Yes, but use an ephemeral token.&lt;/p&gt;

&lt;p&gt;Mint the token server-side via &lt;code&gt;/v1/realtime/sessions&lt;/code&gt;, hand the short-lived token to the browser, and connect to the WebSocket directly. The parent API key should never leave your server.&lt;/p&gt;

&lt;h3&gt;
  
  
  What audio quality can I expect?
&lt;/h3&gt;

&lt;p&gt;TTS output is available as high-fidelity MP3 or 8 kHz μ-law. The voice agent runs PCM16 at 24 kHz internally.&lt;/p&gt;

&lt;p&gt;Quality is on par with major commercial TTS engines; latency is the differentiator.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does it work with telephony?
&lt;/h3&gt;

&lt;p&gt;Yes. μ-law output is the standard format for SIP and PSTN bridges.&lt;/p&gt;

&lt;p&gt;You still need a SIP provider. xAI does not ship its own SIP gateway today.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does cloning quality compare to other tools?
&lt;/h3&gt;

&lt;p&gt;Cloning quality depends more on reference audio quality than length.&lt;/p&gt;

&lt;p&gt;A clean 60-second sample in a quiet room beats a noisy 120-second sample. The resulting &lt;code&gt;voice_id&lt;/code&gt; works across both the TTS endpoint and the voice agent without recloning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use Grok Voice for AI characters in a game?
&lt;/h3&gt;

&lt;p&gt;Yes. The TTS endpoint is fast enough for runtime generation, and Custom Voices lets each character use its own clone.&lt;/p&gt;

&lt;p&gt;Watch latency on long lines. Chunked TTS is the recommended pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Grok Voice is a direct path to building a real-time voice agent with no per-minute charge on the xAI Console. Start with a console key, pick a preset voice, test a WebSocket session, and only then add custom voice cloning or tool calls.&lt;/p&gt;

&lt;p&gt;The fastest validation loop is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Script a session in Apidog.&lt;/li&gt;
&lt;li&gt;Run it against three preset voices.&lt;/li&gt;
&lt;li&gt;Compare latency, tone, and turn-taking.&lt;/li&gt;
&lt;li&gt;Add tool calls once the base conversation works.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When you are ready to plug it into Grok 4.3 reasoning, see the &lt;a href="http://apidog.com/blog/how-to-use-grok-4-3-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Grok 4.3 API guide&lt;/a&gt;. For a side-by-side against OpenAI’s stack, see &lt;a href="http://apidog.com/blog/grok-voice-vs-gpt-realtime-best-voice-model?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Grok Voice vs GPT-Realtime&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>What Is GPT-Realtime-2 and How to Use the GPT-Realtime-2 API</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 08 May 2026 07:23:10 +0000</pubDate>
      <link>https://forem.com/hassann/what-is-gpt-realtime-2-and-how-to-use-the-gpt-realtime-2-api-2kk</link>
      <guid>https://forem.com/hassann/what-is-gpt-realtime-2-and-how-to-use-the-gpt-realtime-2-api-2kk</guid>
      <description>&lt;p&gt;OpenAI shipped GPT-Realtime-2 on November 6, 2026. It is a speech-to-speech model with GPT-5-class reasoning, a 128,000-token context window, and configurable reasoning effort so you can trade latency for answer quality. If you already use &lt;code&gt;gpt-realtime&lt;/code&gt;, migration mostly means changing the model string and adding a few optional session/tool fields.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide shows what changed, how pricing works, and how to call GPT-Realtime-2 over WebSocket and SIP. It also includes a practical setup in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; so you can replay Realtime sessions without re-recording audio for every test.&lt;/p&gt;

&lt;p&gt;For context on OpenAI’s broader 2026 model line, see &lt;a href="http://apidog.com/blog/what-is-gpt-5-5?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;What is GPT-5.5&lt;/a&gt;. For the multimodal sibling, see &lt;a href="http://apidog.com/blog/how-to-use-gpt-image-2-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the GPT-Image-2 API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Model ID: &lt;code&gt;gpt-realtime-2&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Context window: 128k tokens&lt;/li&gt;
&lt;li&gt;Max output: 32k tokens&lt;/li&gt;
&lt;li&gt;Input modalities: text, audio, image&lt;/li&gt;
&lt;li&gt;Output modalities: text, audio&lt;/li&gt;
&lt;li&gt;Audio pricing: &lt;strong&gt;$32 / 1M input tokens&lt;/strong&gt;, &lt;strong&gt;$64 / 1M output tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Cached audio input: &lt;strong&gt;$0.40 / 1M tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;New Realtime-only voices: &lt;strong&gt;Cedar&lt;/strong&gt; and &lt;strong&gt;Marin&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Reasoning levels: &lt;code&gt;minimal&lt;/code&gt;, &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, &lt;code&gt;xhigh&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Default reasoning level: &lt;code&gt;low&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;WebSocket endpoint:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://api.openai.com/v1/realtime?model=gpt-realtime-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;SIP sessions use:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://api.openai.com/v1/realtime?call_id={call_id}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Companion models:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-Realtime-Translate&lt;/strong&gt;: live translation, 70 input languages, $0.034/min&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-Realtime-Whisper&lt;/strong&gt;: streaming speech-to-text, $0.017/min&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Use &lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to script WebSocket sessions, capture frames, and compare event output between runs.&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is GPT-Realtime-2?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/" rel="noopener noreferrer"&gt;GPT-Realtime-2&lt;/a&gt; is a single speech-to-speech model. You stream audio in, receive audio out, and the model handles transcription, reasoning, tool selection, and voice generation in one pass.&lt;/p&gt;

&lt;p&gt;That means you do not need to build a separate STT → LLM → TTS pipeline. The model runs on the existing Realtime API surface and improves the previous &lt;code&gt;gpt-realtime&lt;/code&gt; flow with stronger reasoning and larger context.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F05%2Fimage-18.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F05%2Fimage-18.png" alt="" width="800" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model accepts text, audio, and images as input, then emits text and audio as output. Image input is new for this model. You can add a screenshot or photo to a live conversation, ask a question by voice, and get a spoken answer.&lt;/p&gt;

&lt;p&gt;That enables agents such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Voice support copilots that can inspect user screenshots&lt;/li&gt;
&lt;li&gt;Field-support agents that reason over photos&lt;/li&gt;
&lt;li&gt;Accessibility assistants that describe what is on screen&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Specs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Attribute&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model ID&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gpt-realtime-2&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;128,000 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max output&lt;/td&gt;
&lt;td&gt;32,000 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Modalities in&lt;/td&gt;
&lt;td&gt;text, audio, image&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Modalities out&lt;/td&gt;
&lt;td&gt;text, audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge cutoff&lt;/td&gt;
&lt;td&gt;2024-09-30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning levels&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;minimal&lt;/code&gt;, &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, &lt;code&gt;xhigh&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Function calling&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remote MCP servers&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image input&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SIP phone calling&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What changed from &lt;code&gt;gpt-realtime&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Compared with &lt;code&gt;gpt-realtime-1.5&lt;/code&gt;, GPT-Realtime-2 improves benchmark performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Big Bench Audio:&lt;/strong&gt; 81.4% → 96.6%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio MultiChallenge:&lt;/strong&gt; 34.7% → 48.5%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those scores used &lt;code&gt;high&lt;/code&gt; and &lt;code&gt;xhigh&lt;/code&gt; reasoning. In production, the default is &lt;code&gt;low&lt;/code&gt; to reduce latency, so you should benchmark your own workload before increasing reasoning effort.&lt;/p&gt;

&lt;p&gt;Key behavior changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Preambles:&lt;/strong&gt; The model can say short filler phrases like “let me check that” while it reasons.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel tool calls with narration:&lt;/strong&gt; The model can call multiple tools and describe progress instead of going silent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better recovery:&lt;/strong&gt; Ambiguous or partially failed turns are handled more gracefully.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain tone control:&lt;/strong&gt; The model can keep specialized terminology consistent and adapt delivery style during a session.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F05%2Fimage-19.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F05%2Fimage-19.png" alt="" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The context window also increased from 32k to 128k tokens. That matters for long-running voice sessions such as support calls, banking workflows, and tutoring sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;p&gt;GPT-Realtime-2 is billed per token, with separate rates for text, audio, and image input.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Token type&lt;/th&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;Cached input&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Text&lt;/td&gt;
&lt;td&gt;$4.00 / 1M&lt;/td&gt;
&lt;td&gt;$0.40 / 1M&lt;/td&gt;
&lt;td&gt;$24.00 / 1M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audio&lt;/td&gt;
&lt;td&gt;$32.00 / 1M&lt;/td&gt;
&lt;td&gt;$0.40 / 1M&lt;/td&gt;
&lt;td&gt;$64.00 / 1M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;$5.00 / 1M&lt;/td&gt;
&lt;td&gt;$0.50 / 1M&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cached input reduces repeated-context cost significantly. If your agent uses a stable system prompt, policy document, or repeated instructions, keep that context cacheable.&lt;/p&gt;

&lt;p&gt;For comparison with the rest of the OpenAI line, see &lt;a href="http://apidog.com/blog/gpt-5-5-pricing?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 pricing&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Companion model pricing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-Realtime-Translate:&lt;/strong&gt; $0.034/min. Supports 70 input languages and 13 output languages, with 12.5% lower Word Error Rate than any other model tested in Hindi, Tamil, and Telugu.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-Realtime-Whisper:&lt;/strong&gt; $0.017/min. Streaming speech-to-text for live captions and continuous transcription.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-Realtime-2&lt;/strong&gt; when you need reasoning and voice generation together.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-Realtime-Translate&lt;/strong&gt; for live multilingual interpretation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-Realtime-Whisper&lt;/strong&gt; when you only need a transcript.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Endpoints and authentication
&lt;/h2&gt;

&lt;p&gt;Available endpoints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;POST https://api.openai.com/v1/chat/completions
POST https://api.openai.com/v1/responses
WSS  wss://api.openai.com/v1/realtime?model=gpt-realtime-2
WSS  wss://api.openai.com/v1/realtime?call_id={call_id}
POST https://api.openai.com/v1/realtime/translations
POST https://api.openai.com/v1/realtime/transcription_sessions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For voice agents, use the WebSocket endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://api.openai.com/v1/realtime?model=gpt-realtime-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Required headers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Authorization: Bearer $OPENAI_API_KEY
OpenAI-Beta: realtime=v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk-proj-..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Connect over WebSocket
&lt;/h2&gt;

&lt;p&gt;Install the WebSocket client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;ws
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a minimal Node.js client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;WebSocket&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ws&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ws&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;wss://api.openai.com/v1/realtime?model=gpt-realtime-2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;OpenAI-Beta&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;realtime=v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;open&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;session.update&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;voice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cedar&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a friendly support agent for a fintech app.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;input_audio_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pcm16&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;output_audio_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pcm16&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;turn_detection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;server_vad&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;effort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;low&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;message&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;response.audio.delta&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// base64 PCM16 audio chunk&lt;/span&gt;
    &lt;span class="c1"&gt;// Pipe this to a speaker, browser AudioWorklet, or media stream.&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The session is event-driven:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Send a &lt;code&gt;session.update&lt;/code&gt; event to configure the voice, audio format, VAD, tools, and reasoning effort.&lt;/li&gt;
&lt;li&gt;Send &lt;code&gt;input_audio_buffer.append&lt;/code&gt; events while the user speaks.&lt;/li&gt;
&lt;li&gt;Receive &lt;code&gt;response.audio.delta&lt;/code&gt; events as the model speaks.&lt;/li&gt;
&lt;li&gt;Handle tool-call events if the model requests external data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;PCM16 at 24 kHz is a safe default. G.711 mu-law and A-law are also supported, which is useful for phone-system integrations.&lt;/p&gt;

&lt;p&gt;For Python, the &lt;code&gt;openai&lt;/code&gt; SDK &lt;code&gt;&amp;gt;= 2.1.0&lt;/code&gt; exposes a &lt;code&gt;realtime&lt;/code&gt; client with the same event names. To compare the Realtime API with the Responses API, see &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the GPT-5.5 API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Voices
&lt;/h2&gt;

&lt;p&gt;GPT-Realtime-2 adds two Realtime-only voices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cedar:&lt;/strong&gt; warm, mid-range male voice. Suitable as a default general-agent voice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Marin:&lt;/strong&gt; bright, clear female voice. Useful for translation and announcements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The previous eight voices are still available:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;alloy
ash
ballad
coral
echo
sage
shimmer
verse
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;They were also retuned for the new audio stack.&lt;/p&gt;

&lt;p&gt;To switch voices mid-session, send another &lt;code&gt;session.update&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;session.update&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;voice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;marin&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Add image input to a voice turn
&lt;/h2&gt;

&lt;p&gt;You can attach an image to a user turn and then ask a question about it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;conversation.item.create&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;message&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;input_image&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://example.com/screenshot.png&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;input_text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What does this error mean?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;

&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;response.create&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Useful implementation patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Voice-driven QA:&lt;/strong&gt; A tester points a camera at a broken UI and the agent dictates a bug report.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Field support:&lt;/strong&gt; A technician shares a wiring-panel photo and the agent walks through diagnostics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accessibility:&lt;/strong&gt; The agent describes a user’s current screen during a support call.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more on OpenAI’s image stack, see &lt;a href="http://apidog.com/blog/how-to-use-gpt-image-2-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the GPT-Image-2 API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Function calling and MCP
&lt;/h2&gt;

&lt;p&gt;GPT-Realtime-2 supports standard function tools and remote MCP servers in the same session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Standard function calling
&lt;/h3&gt;

&lt;p&gt;The flow is similar to Chat Completions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Declare tools in the session config.&lt;/li&gt;
&lt;li&gt;The model emits &lt;code&gt;response.function_call_arguments.delta&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Your app executes the function.&lt;/li&gt;
&lt;li&gt;Your app sends a &lt;code&gt;conversation.item.create&lt;/code&gt; event with &lt;code&gt;function_call_output&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The important change is parallel calling. The model can trigger multiple calls at once and narrate progress while waiting for results.&lt;/p&gt;

&lt;p&gt;Example session update:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;session.update&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;function&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;lookup_account&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Look up a customer account by ID.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;account_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;function&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;list_transactions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;List recent transactions for an account.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;account_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;number&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;account_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Remote MCP servers
&lt;/h3&gt;

&lt;p&gt;Remote MCP support lets the Realtime API call tools from an MCP server directly. Configure the MCP URL and allowed tools in the session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;session.update&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;server_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://mcp.example.com/sse&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;allowed_tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;lookup_account&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;list_transactions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful when your voice agent needs access to a larger tool catalog without manually routing every function call through your WebSocket loop.&lt;/p&gt;

&lt;p&gt;If you are testing MCP servers before wiring them into a voice agent, see &lt;a href="http://apidog.com/blog/mcp-server-testing-apidog?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;MCP server testing in Apidog&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  SIP phone calling
&lt;/h2&gt;

&lt;p&gt;GPT-Realtime-2 can handle real phone calls through SIP.&lt;/p&gt;

&lt;p&gt;At a high level:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Point your SIP trunk at OpenAI’s SIP gateway.&lt;/li&gt;
&lt;li&gt;An inbound call opens a Realtime WebSocket session.&lt;/li&gt;
&lt;li&gt;Your app connects using the call ID:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://api.openai.com/v1/realtime?call_id={call_id}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model accepts G.711 mu-law and A-law directly, so your bridge does not need to transcode audio before sending it to the Realtime API.&lt;/p&gt;

&lt;p&gt;This makes GPT-Realtime-2 suitable for call-center-style agents where most turns involve listening, calling tools, and responding by voice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configure reasoning effort
&lt;/h2&gt;

&lt;p&gt;Reasoning effort controls the latency/quality tradeoff.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Approx. latency cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;minimal&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Single-turn yes/no answers&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;low&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Default; everyday support and chat&lt;/td&gt;
&lt;td&gt;small&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;medium&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Disambiguation, complex tool dispatch&lt;/td&gt;
&lt;td&gt;moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;high&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Multi-step reasoning, code review by voice&lt;/td&gt;
&lt;td&gt;high&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;xhigh&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Benchmarks, hard analytical questions&lt;/td&gt;
&lt;td&gt;highest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Default to &lt;code&gt;low&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;session.update&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;effort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;low&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Move to &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, or &lt;code&gt;xhigh&lt;/code&gt; only when you can measure a quality gap. The latency cost is noticeable in live calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test the Realtime API in Apidog
&lt;/h2&gt;

&lt;p&gt;WebSocket APIs are difficult to debug from the terminal because every connection has state. Apidog gives you a repeatable way to test the same Realtime session.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F05%2Fimage-20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.apidog.com%2Fblog-next%2F2026%2F05%2Fimage-20.png" alt="" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A practical test workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new WebSocket request.&lt;/li&gt;
&lt;li&gt;Use this URL:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://api.openai.com/v1/realtime?model=gpt-realtime-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Add headers:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Authorization: Bearer {{OPENAI_API_KEY}}
OpenAI-Beta: realtime=v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Save a &lt;code&gt;session.update&lt;/code&gt; message.&lt;/li&gt;
&lt;li&gt;Add scripted messages such as:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;input_audio_buffer.append&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;input_audio_buffer.commit&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;response.create&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Replay the script against one connection.&lt;/li&gt;
&lt;li&gt;Capture all server events.&lt;/li&gt;
&lt;li&gt;Diff runs when changing voice, reasoning effort, or tool configuration.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Download Apidog&lt;/a&gt;, create a WebSocket request, and store your bearer token under &lt;strong&gt;Auth&lt;/strong&gt; or an environment variable.&lt;/p&gt;

&lt;p&gt;For comparison with another fast multimodal model, see &lt;a href="http://apidog.com/blog/how-to-use-gemini-3-flash-preview-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the Gemini 3 Flash Preview API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What model ID should I use?
&lt;/h3&gt;

&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gpt-realtime-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The earlier model is still available as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gpt-realtime
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The lite version is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gpt-realtime-2-mini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Can I stream input audio while output audio is still playing?
&lt;/h3&gt;

&lt;p&gt;Yes. The Realtime API uses server-side voice activity detection by default, so the model can stop speaking when the user starts. You can also disable VAD and manage turn boundaries from the client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does the 128k context include audio tokens?
&lt;/h3&gt;

&lt;p&gt;Yes. Audio is tokenized. One second of audio is roughly 50 tokens depending on format. Long calls can consume context faster than long text chats, so inspect usage before assuming the full 128k window is enough.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is fine-tuning supported?
&lt;/h3&gt;

&lt;p&gt;Not yet. Per the model card, GPT-Realtime-2 does not yet support fine-tuning, predicted outputs, or text streaming on Chat Completions. The Realtime endpoint streams audio inherently.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does GPT-Realtime-2 compare to GPT-5.5 plus TTS?
&lt;/h3&gt;

&lt;p&gt;GPT-Realtime-2 performs end-to-end speech reasoning. A voice-aware model can respond to tone, hesitation, and emphasis. A text model with TTS cannot use those audio cues in the same way.&lt;/p&gt;

&lt;p&gt;For pure text reasoning, see &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;How to use the GPT-5.5 API&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What rate limits apply?
&lt;/h3&gt;

&lt;p&gt;Tier 1 starts at 40,000 tokens per minute and scales to 15M TPM at Tier 5. Rate limits are per model, so existing GPT-5 quota does not carry over.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;GPT-Realtime-2 gives you a single API surface for voice input, reasoning, tool use, image input, and spoken output. The main implementation path is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start with the WebSocket endpoint.&lt;/li&gt;
&lt;li&gt;Configure &lt;code&gt;session.update&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;low&lt;/code&gt; reasoning by default.&lt;/li&gt;
&lt;li&gt;Add tools only after the basic audio loop works.&lt;/li&gt;
&lt;li&gt;Test repeated sessions in Apidog.&lt;/li&gt;
&lt;li&gt;Increase reasoning effort only when measured quality requires it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The combination of 128k context, GPT-5-class reasoning, image input, MCP, and SIP support makes it practical to build voice agents that can answer calls, inspect screenshots, dispatch tools, and recover from failed turns without leaving the Realtime session.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Best Local LLMs of 2026</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 08 May 2026 06:38:04 +0000</pubDate>
      <link>https://forem.com/hassann/best-local-llms-of-2026-5gm0</link>
      <guid>https://forem.com/hassann/best-local-llms-of-2026-5gm0</guid>
      <description>&lt;p&gt;This guide helps you choose a local LLM for 2026 based on VRAM, latency, and workload, then serve and test it through an OpenAI-compatible API using Ollama, vLLM, LM Studio, and &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The “best” local LLM in 2026 depends on your VRAM budget, latency target, and use case: coding, reasoning, multilingual, or vision.&lt;/li&gt;
&lt;li&gt;For 24 GB GPUs, &lt;strong&gt;Qwen 3.6 32B&lt;/strong&gt; and &lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt; are the strongest all-rounders.&lt;/li&gt;
&lt;li&gt;For 8 GB and below, &lt;strong&gt;Gemma 4 9B&lt;/strong&gt; and &lt;strong&gt;Llama 5.1 8B&lt;/strong&gt; are the practical picks.&lt;/li&gt;
&lt;li&gt;For reasoning or coding-heavy workloads, use &lt;strong&gt;DeepSeek V4 Pro&lt;/strong&gt; quantized or &lt;strong&gt;GLM 5&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Ollama&lt;/strong&gt; or &lt;strong&gt;LM Studio&lt;/strong&gt; to expose an OpenAI-compatible HTTP endpoint.&lt;/li&gt;
&lt;li&gt;Test local models with &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; the same way you test hosted models.&lt;/li&gt;
&lt;li&gt;Use Apidog to mock, replay, and benchmark local model traffic without spending hosted LLM tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are already focused on DeepSeek, see the &lt;a href="http://apidog.com/blog/how-to-run-deepseek-v4-locally?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 local install guide&lt;/a&gt; and the &lt;a href="http://apidog.com/blog/what-is-deepseek-v4?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 overview&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why local LLMs matter again in 2026
&lt;/h2&gt;

&lt;p&gt;A few years ago, running a local LLM usually meant accepting lower quality. That is less true now.&lt;/p&gt;

&lt;p&gt;Open-weight models have narrowed the quality gap with hosted GPT-4-class systems, especially for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extraction&lt;/li&gt;
&lt;li&gt;Classification&lt;/li&gt;
&lt;li&gt;Tool calling&lt;/li&gt;
&lt;li&gt;Coding assistance&lt;/li&gt;
&lt;li&gt;Reasoning workflows&lt;/li&gt;
&lt;li&gt;Structured output generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bigger change is hardware. A 24 GB consumer GPU can run a 32B-parameter model at production-quality 4-bit quantization. A Mac Studio with 64 GB unified memory can run DeepSeek V4 Flash at usable speeds.&lt;/p&gt;

&lt;p&gt;Local models now make sense when you care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data residency&lt;/li&gt;
&lt;li&gt;Vendor lock-in&lt;/li&gt;
&lt;li&gt;Predictable inference cost&lt;/li&gt;
&lt;li&gt;Offline or private workloads&lt;/li&gt;
&lt;li&gt;Internal tools and CI workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The hard part is no longer only “is the model good enough?” It is also:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can your app call the local model the same way it calls a hosted API?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is why OpenAI-compatible serving and API testing tools matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Selection criteria
&lt;/h2&gt;

&lt;p&gt;This shortlist is not just a leaderboard scrape. The criteria:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open weights with a permissive license such as MIT, Apache 2.0, or a production-friendly community license&lt;/li&gt;
&lt;li&gt;Active maintenance in 2026&lt;/li&gt;
&lt;li&gt;OpenAI-compatible serving through Ollama, vLLM, or LM Studio&lt;/li&gt;
&lt;li&gt;Strong real-world performance in at least one area:

&lt;ul&gt;
&lt;li&gt;General reasoning&lt;/li&gt;
&lt;li&gt;Code&lt;/li&gt;
&lt;li&gt;Multilingual output&lt;/li&gt;
&lt;li&gt;Vision&lt;/li&gt;
&lt;li&gt;Long context&lt;/li&gt;
&lt;li&gt;Tool calling&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Reasonable hardware requirements&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The models were tested with the same prompt set on a 4090 and a Mac Studio M3 Ultra, then cross-checked against &lt;a href="https://chat.lmsys.org/" rel="noopener noreferrer"&gt;LMSYS Chatbot Arena&lt;/a&gt; and the &lt;a href="https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard" rel="noopener noreferrer"&gt;Hugging Face Open LLM Leaderboard&lt;/a&gt; where applicable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local LLM picks for 2026
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Practical hardware target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Pro&lt;/td&gt;
&lt;td&gt;Reasoning-heavy agents&lt;/td&gt;
&lt;td&gt;192 GB unified memory or 2x 80 GB GPUs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Flash&lt;/td&gt;
&lt;td&gt;General local agent, coding, RAG&lt;/td&gt;
&lt;td&gt;24 GB VRAM at Q4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.6 32B&lt;/td&gt;
&lt;td&gt;Multilingual, structured output, tool calling&lt;/td&gt;
&lt;td&gt;24 GB VRAM at Q4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GLM 5.1&lt;/td&gt;
&lt;td&gt;Tool-calling agents, extraction, JSON workflows&lt;/td&gt;
&lt;td&gt;Local serving through Ollama or vLLM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 5.1 8B&lt;/td&gt;
&lt;td&gt;Smaller local setups&lt;/td&gt;
&lt;td&gt;8 GB-class hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 9B&lt;/td&gt;
&lt;td&gt;Lightweight local assistants&lt;/td&gt;
&lt;td&gt;8 GB-class hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  1. DeepSeek V4 Pro
&lt;/h2&gt;

&lt;p&gt;DeepSeek V4 Pro is the flagship model in the DeepSeek V4 release. It is available as 4-bit GGUF and AWQ on Hugging Face.&lt;/p&gt;

&lt;p&gt;The full model has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1.6T total parameters&lt;/li&gt;
&lt;li&gt;49B active parameters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That puts it in datacenter-class territory. Quantized to Q4, it fits on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A pair of 80 GB H100s&lt;/li&gt;
&lt;li&gt;A Mac Studio M3 Ultra with 192 GB unified memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most developers, V4 Pro is not the first model to run locally. It is more useful as a reference point for high-end reasoning quality.&lt;/p&gt;

&lt;p&gt;If you would rather use the same family through a hosted API, see &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to use the DeepSeek V4 API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; reasoning-heavy agents and high-end local inference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware:&lt;/strong&gt; 192 GB unified memory or 2x 80 GB GPUs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where to get it:&lt;/strong&gt; &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" rel="noopener noreferrer"&gt;DeepSeek V4 Pro GGUF on Hugging Face&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. DeepSeek V4 Flash
&lt;/h2&gt;

&lt;p&gt;DeepSeek V4 Flash is the smaller V4 variant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;284B total parameters&lt;/li&gt;
&lt;li&gt;13B active parameters&lt;/li&gt;
&lt;li&gt;Fits in 24 GB VRAM at 4-bit quantization&lt;/li&gt;
&lt;li&gt;Leaves room for a 64K context window&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On a 4090, throughput averages about 28 tokens per second on long-form generation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi06uvwrpb8br1ylseq9h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi06uvwrpb8br1ylseq9h.png" alt="DeepSeek V4 Flash" width="800" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the DeepSeek model most teams are likely to run locally. In testing, reasoning quality stayed close to V4 Pro, while coding was slightly behind.&lt;/p&gt;

&lt;p&gt;For an end-to-end setup, use the &lt;a href="http://apidog.com/blog/how-to-run-deepseek-v4-locally?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 local install guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; general-purpose local agents, coding assistants, and RAG generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware:&lt;/strong&gt; 24 GB VRAM at Q4, or 16 GB at Q3 with quality loss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where to get it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull deepseek-v4-flash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use the &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash" rel="noopener noreferrer"&gt;Hugging Face GGUF&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Qwen 3.6 32B
&lt;/h2&gt;

&lt;p&gt;Alibaba’s Qwen models have been one of the most consistent open-weight model families.&lt;/p&gt;

&lt;p&gt;Qwen 3.6 32B at Q4 fits in 24 GB VRAM and performs well on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;General reasoning&lt;/li&gt;
&lt;li&gt;Tool calling&lt;/li&gt;
&lt;li&gt;Structured outputs&lt;/li&gt;
&lt;li&gt;Multilingual tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Its multilingual support is the main reason to choose it over many Western open models. It handles Chinese, Japanese, Korean, and Arabic at a high level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddpqlfd78fqtq9387yx2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fddpqlfd78fqtq9387yx2.png" alt="Qwen 3.6" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If your product needs one local model for reasoning plus multilingual output, Qwen 3.6 32B is the most practical pick.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; multilingual products, structured output, tool calling, and balanced cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware:&lt;/strong&gt; 24 GB VRAM at Q4.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where to get it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen3.6:32b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use &lt;a href="https://huggingface.co/Qwen/Qwen3.6-32B" rel="noopener noreferrer"&gt;Qwen 3.6 on Hugging Face&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. GLM 5.1
&lt;/h2&gt;

&lt;p&gt;Zhipu AI’s GLM line has become a strong option for tool-calling and structured workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://z.ai/blog/glm-5.1" rel="noopener noreferrer"&gt;GLM 5.1&lt;/a&gt; scores near the top among open models on tool-calling benchmarks. Its strongest areas are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reasoning&lt;/li&gt;
&lt;li&gt;Classification&lt;/li&gt;
&lt;li&gt;Structured extraction&lt;/li&gt;
&lt;li&gt;JSON-mode workflows&lt;/li&gt;
&lt;li&gt;Instruction following&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Coding is weaker than its reasoning and extraction performance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllhaiswc3qyj6zu9nhmf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fllhaiswc3qyj6zu9nhmf.png" alt="GLM 5.1" width="800" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Choose GLM 5.1 when your workload is mostly tool calls, agentic workflows, or JSON schema extraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; tool-calling agents, structured extraction, and JSON-mode pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serve a local LLM like a hosted API
&lt;/h2&gt;

&lt;p&gt;Once the model is running, your application still needs an HTTP endpoint.&lt;/p&gt;

&lt;p&gt;Three serving paths matter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: Ollama
&lt;/h3&gt;

&lt;p&gt;Ollama is the easiest path for local development.&lt;/p&gt;

&lt;p&gt;Start the server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pull a model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen3.6:32b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ollama exposes an OpenAI-compatible endpoint at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:11434/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means most OpenAI SDK-based apps only need two changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;base_url&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;model&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Option 2: vLLM
&lt;/h3&gt;

&lt;p&gt;vLLM is the production-oriented option.&lt;/p&gt;

&lt;p&gt;Use it when you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better throughput&lt;/li&gt;
&lt;li&gt;Lower latency&lt;/li&gt;
&lt;li&gt;Continuous batching&lt;/li&gt;
&lt;li&gt;Higher concurrency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It exposes an OpenAI-compatible API at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:8000/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 3: LM Studio
&lt;/h3&gt;

&lt;p&gt;LM Studio is useful for individual developers who want a GUI.&lt;/p&gt;

&lt;p&gt;Enable the local server in settings, then point your app or API client at the exposed local endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Minimal Python client example
&lt;/h2&gt;

&lt;p&gt;The OpenAI Python client can call Ollama, vLLM, or LM Studio if the server exposes an OpenAI-compatible API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# any string; Ollama ignores it
&lt;/span&gt;    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3.6:32b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the differences between MoE and dense models in three bullets.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To switch models, change only the model name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-v4-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama5.1:8b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The request shape stays the same.&lt;/p&gt;

&lt;p&gt;For a related hosted/local workflow, see &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-for-free?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to use DeepSeek V4 for free&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test local models with Apidog
&lt;/h2&gt;

&lt;p&gt;Local inference gives you control, but it also gives you more things to debug.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fou3bxrkm26ig08d3q0jx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fou3bxrkm26ig08d3q0jx.png" alt="Testing local models with Apidog" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a hosted provider breaks, you check the status page. When your local model breaks, you own the issue.&lt;/p&gt;

&lt;p&gt;You need to inspect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Raw requests&lt;/li&gt;
&lt;li&gt;Headers&lt;/li&gt;
&lt;li&gt;Streaming responses&lt;/li&gt;
&lt;li&gt;Tool-call payloads&lt;/li&gt;
&lt;li&gt;Token latency&lt;/li&gt;
&lt;li&gt;Time to first token&lt;/li&gt;
&lt;li&gt;Output differences between model versions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; treats your Ollama or vLLM endpoint like any other API.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Save canonical requests
&lt;/h3&gt;

&lt;p&gt;Create one request collection per model.&lt;/p&gt;

&lt;p&gt;Include realistic values for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt&lt;/li&gt;
&lt;li&gt;System message&lt;/li&gt;
&lt;li&gt;Temperature&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_tokens&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Tool definitions&lt;/li&gt;
&lt;li&gt;JSON schema requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Replay the same request whenever you change models or quantization levels.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Diff outputs across models
&lt;/h3&gt;

&lt;p&gt;Run the same prompt against:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Qwen 3.6&lt;/li&gt;
&lt;li&gt;DeepSeek V4 Flash&lt;/li&gt;
&lt;li&gt;GLM 5.1&lt;/li&gt;
&lt;li&gt;Llama 5.1&lt;/li&gt;
&lt;li&gt;Gemma 4&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then compare responses to spot regressions before shipping.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Mock the endpoint for CI
&lt;/h3&gt;

&lt;p&gt;CI should not need a 24 GB GPU to pass.&lt;/p&gt;

&lt;p&gt;Use Apidog mocks to return realistic JSON or streaming responses during tests. That keeps unit and integration tests deterministic even when the local model is offline.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Benchmark throughput
&lt;/h3&gt;

&lt;p&gt;Track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;Time to first token&lt;/li&gt;
&lt;li&gt;Tokens per second&lt;/li&gt;
&lt;li&gt;Failure rate&lt;/li&gt;
&lt;li&gt;Response size&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use those numbers to compare Q4 vs Q5 quantization or Ollama vs vLLM.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Document the local API
&lt;/h3&gt;

&lt;p&gt;Apidog projects can export OpenAPI 3.1, so teammates get a clear contract for calling your internal local model endpoint.&lt;/p&gt;

&lt;p&gt;For a broader API workflow, see &lt;a href="http://apidog.com/blog/best-self-hosted-postman-alternatives-2026-2?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog as a Postman alternative&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes when running local LLMs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Picking the biggest model that fits
&lt;/h3&gt;

&lt;p&gt;A 32B model at Q3 can be worse than a 14B model at Q5.&lt;/p&gt;

&lt;p&gt;Once you go below 4-bit quantization, quality can drop quickly. Do not compare parameter count without comparing quantization quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Forgetting that context length uses VRAM
&lt;/h3&gt;

&lt;p&gt;A long context window is not free.&lt;/p&gt;

&lt;p&gt;A 32K-token context on a 32B model needs several GB of KV cache. Reserve memory for context before choosing the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trusting random fine-tunes
&lt;/h3&gt;

&lt;p&gt;Avoid random Hugging Face uploads for production workloads.&lt;/p&gt;

&lt;p&gt;Prefer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Original model cards&lt;/li&gt;
&lt;li&gt;Known fine-tune authors&lt;/li&gt;
&lt;li&gt;Reproducible evaluation results&lt;/li&gt;
&lt;li&gt;Clear licenses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A poisoned or poorly trained fine-tune can create security and reliability issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skipping the mock layer
&lt;/h3&gt;

&lt;p&gt;Local models go down.&lt;/p&gt;

&lt;p&gt;Common causes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Driver crashes&lt;/li&gt;
&lt;li&gt;OOM kills&lt;/li&gt;
&lt;li&gt;GPU throttling&lt;/li&gt;
&lt;li&gt;Process restarts&lt;/li&gt;
&lt;li&gt;Broken model downloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If CI calls the real local model directly, your tests become flaky. Mock the endpoint in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ignoring tool-call format differences
&lt;/h3&gt;

&lt;p&gt;Different models can support tool calls but emit slightly different JSON shapes.&lt;/p&gt;

&lt;p&gt;Test each model before swapping it into production.&lt;/p&gt;

&lt;p&gt;Pay attention to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Function name fields&lt;/li&gt;
&lt;li&gt;Argument serialization&lt;/li&gt;
&lt;li&gt;Streaming chunks&lt;/li&gt;
&lt;li&gt;Invalid JSON recovery&lt;/li&gt;
&lt;li&gt;Empty tool-call responses&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-world usage patterns
&lt;/h2&gt;

&lt;p&gt;A startup running a customer-support agent moved from GPT-5.5 to Qwen 3.6 32B on a single 4090. Latency stayed under 800 ms, monthly inference cost dropped, and the team uses Apidog mocks to keep CI deterministic.&lt;/p&gt;

&lt;p&gt;A solo developer building a voice assistant runs Gemma 4 9B on an M2 Pro with 16 GB unified memory. Multi-token prediction drafters provide enough throughput for a native-feeling assistant.&lt;/p&gt;

&lt;p&gt;A fintech research team runs DeepSeek V4 Flash on two 4090s for nightly batch summarization of regulatory filings. Their cost per summary is mostly electricity and maintenance time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation checklist
&lt;/h2&gt;

&lt;p&gt;Use this flow to get from model choice to testable local API.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Pick the model
&lt;/h3&gt;

&lt;p&gt;For 24 GB VRAM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Qwen 3.6 32B
DeepSeek V4 Flash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For smaller machines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Llama 5.1 8B
Gemma 4 9B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For tool-heavy workflows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GLM 5.1
Qwen 3.6 32B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For high-end reasoning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DeepSeek V4 Pro
DeepSeek V4 Flash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Pull the model
&lt;/h3&gt;

&lt;p&gt;Example with Ollama:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen3.6:32b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Start the local server
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Test the endpoint
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:11434/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "qwen3.6:32b",
    "messages": [
      {
        "role": "user",
        "content": "Return three API testing best practices as JSON."
      }
    ],
    "temperature": 0.2
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Add the endpoint to Apidog
&lt;/h3&gt;

&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:11434/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then create saved requests for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Normal chat&lt;/li&gt;
&lt;li&gt;Streaming chat&lt;/li&gt;
&lt;li&gt;Tool calling&lt;/li&gt;
&lt;li&gt;JSON output&lt;/li&gt;
&lt;li&gt;Long-context prompts&lt;/li&gt;
&lt;li&gt;Failure cases&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Replay before every model swap
&lt;/h3&gt;

&lt;p&gt;Before changing from one model to another, replay the same collection and compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Output structure&lt;/li&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;Tool-call behavior&lt;/li&gt;
&lt;li&gt;JSON validity&lt;/li&gt;
&lt;li&gt;Error handling&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The best local LLM in 2026 is the one that fits your VRAM, latency budget, and quality bar.&lt;/p&gt;

&lt;p&gt;Most teams should start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Qwen 3.6 32B&lt;/strong&gt; or &lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt; for 24 GB GPUs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Llama 5.1 8B&lt;/strong&gt; or &lt;strong&gt;Gemma 4 9B&lt;/strong&gt; for smaller hardware&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GLM 5.1&lt;/strong&gt; when tool calling and structured extraction are the main workload&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek V4 Pro&lt;/strong&gt; only when you have high-end hardware and need maximum reasoning quality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Five practical takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local model quality is close enough for many production tasks.&lt;/li&gt;
&lt;li&gt;Ollama plus an OpenAI-compatible client is the fastest setup path.&lt;/li&gt;
&lt;li&gt;Quantization quality matters more than raw parameter count.&lt;/li&gt;
&lt;li&gt;Treat the local model as a production API.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to save requests, mock CI, benchmark runs, and document the endpoint.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next step: pick a model, run &lt;code&gt;ollama pull &amp;lt;name&amp;gt;&lt;/code&gt;, and point &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:11434/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can start replaying and benchmarking requests within an hour.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the best local LLM for a 24 GB GPU in 2026?
&lt;/h3&gt;

&lt;p&gt;For most workloads, use Qwen 3.6 32B at Q4 or DeepSeek V4 Flash at Q4.&lt;/p&gt;

&lt;p&gt;Pick Qwen for multilingual or tool-heavy tasks. Pick DeepSeek V4 Flash for reasoning and coding. See the &lt;a href="http://apidog.com/blog/how-to-run-deepseek-v4-locally?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 local guide&lt;/a&gt; for setup details.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I run a local LLM on a Mac?
&lt;/h3&gt;

&lt;p&gt;Yes. Apple silicon with 16 GB or more unified memory can run Llama 5.1 8B and Gemma 4 9B comfortably.&lt;/p&gt;

&lt;p&gt;An M3 Ultra with 192 GB unified memory can run DeepSeek V4 Pro at Q4. Use Ollama or LM Studio.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I test a local LLM the same way I test OpenAI?
&lt;/h3&gt;

&lt;p&gt;Point your OpenAI-compatible client and your &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; project at the local serving URL.&lt;/p&gt;

&lt;p&gt;Ollama:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:11434/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;vLLM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:8000/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The request shape stays the same. Only the base URL and model name change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is local LLM quality really at parity with hosted?
&lt;/h3&gt;

&lt;p&gt;For reasoning, coding, classification, extraction, and tool calling, top open models are often within single-digit percentage points of hosted models.&lt;/p&gt;

&lt;p&gt;Hosted models still tend to lead on vision, long-context document QA, and creative writing.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about cost?
&lt;/h3&gt;

&lt;p&gt;A 4090 can run DeepSeek V4 Flash for the price of electricity and hardware maintenance.&lt;/p&gt;

&lt;p&gt;At high volume, hosted inference can cost hundreds or thousands per month. The break-even point depends on utilization, but it is often around millions of tokens per month.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I switch a production app between hosted and local?
&lt;/h3&gt;

&lt;p&gt;Keep the OpenAI-compatible client.&lt;/p&gt;

&lt;p&gt;Change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;base_url
model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then replay saved API requests before shipping the swap. See &lt;a href="http://apidog.com/blog/api-testing-without-postman-2026?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;API testing without Postman&lt;/a&gt; for the same testing pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where can I track current model rankings?
&lt;/h3&gt;

&lt;p&gt;Use both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard" rel="noopener noreferrer"&gt;Hugging Face Open LLM Leaderboard&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://chat.lmsys.org/" rel="noopener noreferrer"&gt;LMSYS Chatbot Arena&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cross-reference them because they measure different things.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Computer Use vs Structured APIs: When Each Wins (2026)</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Fri, 08 May 2026 02:36:54 +0000</pubDate>
      <link>https://forem.com/hassann/computer-use-vs-structured-apis-when-each-wins-2026-2p31</link>
      <guid>https://forem.com/hassann/computer-use-vs-structured-apis-when-each-wins-2026-2p31</guid>
      <description>&lt;p&gt;Driving a browser with an LLM through computer-use models can cost roughly 45x more than calling the same vendor through a structured API.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide explains where that 45x gap comes from, when computer use is still worth it, and how to design cheaper agent workflows with &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;. The same framework applies to OpenAI Operator, Anthropic computer use, browser-use, Skyvern, and any agent runtime built around a screenshot loop.&lt;/p&gt;

&lt;p&gt;If you write APIs for AI agents, also read the companion guide on &lt;a href="http://apidog.com/blog/how-to-write-agents-md-files?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to write agents.md files&lt;/a&gt;. Those conventions make the structured-API path easier for agents to discover and call.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Computer use means an LLM reads screenshots and emits clicks, keystrokes, and scrolls.&lt;/li&gt;
&lt;li&gt;Structured APIs mean the LLM emits JSON tool calls that your backend executes.&lt;/li&gt;
&lt;li&gt;For the same task, computer use often burns 30x to 50x more tokens because every step sends another screenshot.&lt;/li&gt;
&lt;li&gt;Use computer use only when no API exists, the API is blocked, or the workflow lives behind an interface you cannot automate cleanly.&lt;/li&gt;
&lt;li&gt;Use structured APIs for payments, search, CRM updates, internal tools, queue jobs, and anything you can document with OpenAPI.&lt;/li&gt;
&lt;li&gt;In production, hybrid is usually the right architecture: structured APIs handle the common path, computer use handles the legacy long tail.&lt;/li&gt;
&lt;li&gt;Use Apidog to design JSON tool schemas, mock endpoints while iterating, and replay requests without burning agent credits.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why the cost gap is so big
&lt;/h2&gt;

&lt;p&gt;The 45x number is not magic. It comes from token usage.&lt;/p&gt;

&lt;p&gt;A structured API call usually looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Send the user request.&lt;/li&gt;
&lt;li&gt;Send a tool schema.&lt;/li&gt;
&lt;li&gt;Receive a JSON object.&lt;/li&gt;
&lt;li&gt;Execute one backend request.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That round trip may use a few hundred input tokens and a small JSON response.&lt;/p&gt;

&lt;p&gt;A computer-use loop looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Send the user request.&lt;/li&gt;
&lt;li&gt;Send a screenshot.&lt;/li&gt;
&lt;li&gt;Receive a click coordinate or keyboard action.&lt;/li&gt;
&lt;li&gt;Execute the action.&lt;/li&gt;
&lt;li&gt;Take another screenshot.&lt;/li&gt;
&lt;li&gt;Repeat until the task finishes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A typical browser task can take 12 to 30 rounds. Each screenshot can cost around 1,500 tokens at common resolutions. Add retries, cookie banners, login screens, scroll mistakes, and misclicks, and the cost multiplies quickly.&lt;/p&gt;

&lt;p&gt;Anthropic documents screenshot token usage in its &lt;a href="https://docs.anthropic.com/claude/docs/computer-use" rel="noopener noreferrer"&gt;computer use documentation&lt;/a&gt;. The Hacker News discussion &lt;a href="https://news.ycombinator.com/item?id=48024859" rel="noopener noreferrer"&gt;Computer Use is 45x more expensive than structured APIs&lt;/a&gt; puts the common penalty around 30x to 50x, which matches the practical pattern you see when replaying the same workflow through both paths in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  When structured APIs win
&lt;/h2&gt;

&lt;p&gt;Default to structured APIs when any of these are true.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The vendor exposes a schema
&lt;/h3&gt;

&lt;p&gt;Use the API if the vendor provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an OpenAPI spec&lt;/li&gt;
&lt;li&gt;a GraphQL schema&lt;/li&gt;
&lt;li&gt;REST docs&lt;/li&gt;
&lt;li&gt;a stable JSON endpoint&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a JSON shape exists, the model can usually fill it through a tool call.&lt;/p&gt;

&lt;p&gt;Example tool shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"update_deal_stage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Update a CRM deal to a new pipeline stage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"deal_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"stage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"enum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"qualified"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"proposal"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"closed_won"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"closed_lost"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"deal_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stage"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is cheaper and easier to validate than asking an agent to open a CRM dashboard and click through a pipeline UI.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The task fits one or two endpoints
&lt;/h3&gt;

&lt;p&gt;These should be API calls, not browser tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a Stripe customer.&lt;/li&gt;
&lt;li&gt;Update a HubSpot deal stage.&lt;/li&gt;
&lt;li&gt;Post a Slack message.&lt;/li&gt;
&lt;li&gt;Trigger a CI rerun.&lt;/li&gt;
&lt;li&gt;Search internal records.&lt;/li&gt;
&lt;li&gt;Generate an invoice.&lt;/li&gt;
&lt;li&gt;Add a user to a workspace.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Routing these through a browser adds cost, latency, and failure modes without adding value.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The workflow runs unattended
&lt;/h3&gt;

&lt;p&gt;Cron jobs, webhooks, queue workers, and background agents need deterministic network calls.&lt;/p&gt;

&lt;p&gt;A screenshot loop can get stuck on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a changed button label&lt;/li&gt;
&lt;li&gt;an unexpected modal&lt;/li&gt;
&lt;li&gt;an expired session&lt;/li&gt;
&lt;li&gt;a slow-loading table&lt;/li&gt;
&lt;li&gt;a scroll position issue&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Structured API calls are easier to retry, monitor, and alert on.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Latency matters
&lt;/h3&gt;

&lt;p&gt;A structured API call may return in hundreds of milliseconds.&lt;/p&gt;

&lt;p&gt;A computer-use loop with 15 browser rounds may take 30 to 90 seconds. If a user is waiting, that usually breaks the experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. You need test coverage
&lt;/h3&gt;

&lt;p&gt;Mocking JSON endpoints is straightforward in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;. Mocking a browser screenshot loop is much harder because every run depends on UI state.&lt;/p&gt;

&lt;h2&gt;
  
  
  When computer use is still useful
&lt;/h2&gt;

&lt;p&gt;Computer use is not useless. It is just expensive. Use it for workflows where a structured path is unavailable or not worth building.&lt;/p&gt;

&lt;h3&gt;
  
  
  Legacy vendor portals
&lt;/h3&gt;

&lt;p&gt;Some procurement, freight, benefits, and compliance portals have no public API. They may live behind &lt;a href="http://ASP.NET" rel="noopener noreferrer"&gt;ASP.NET&lt;/a&gt; sessions, old forms, or vendor-specific auth flows.&lt;/p&gt;

&lt;p&gt;If the alternative is maintaining brittle Selenium scripts that break every quarter, paying more per run can be acceptable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Internal tools you cannot change
&lt;/h3&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a legacy ERP&lt;/li&gt;
&lt;li&gt;a client-owned CRM&lt;/li&gt;
&lt;li&gt;a SharePoint dashboard&lt;/li&gt;
&lt;li&gt;an admin portal maintained by another team&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you cannot add endpoints and the workflow volume is low, computer use may be practical.&lt;/p&gt;

&lt;h3&gt;
  
  
  One-off operator tasks
&lt;/h3&gt;

&lt;p&gt;A founder asking an agent to “research these 50 competitors and put the highlights in Notion” may not need a formal API contract.&lt;/p&gt;

&lt;p&gt;For one-off or rare work, computer use can be cheaper than building an integration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Workflows blocked by terms of service
&lt;/h3&gt;

&lt;p&gt;Be careful here. Many “use a browser agent to scrape this website” requests violate vendor terms. The token bill may be the least important risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision framework
&lt;/h2&gt;

&lt;p&gt;Run every agent workflow through these checks before choosing computer use.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Check&lt;/th&gt;
&lt;th&gt;If yes&lt;/th&gt;
&lt;th&gt;If no&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Does a documented API exist?&lt;/td&gt;
&lt;td&gt;Use the API.&lt;/td&gt;
&lt;td&gt;Continue.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can you ship a thin server-side adapter around a private endpoint?&lt;/td&gt;
&lt;td&gt;Build the adapter and expose JSON.&lt;/td&gt;
&lt;td&gt;Continue.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Is the task one-off or low-volume, for example fewer than 100 runs/day?&lt;/td&gt;
&lt;td&gt;Computer use can be acceptable.&lt;/td&gt;
&lt;td&gt;Continue.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Are you comfortable paying 30x to 50x more token cost on every run?&lt;/td&gt;
&lt;td&gt;Use computer use.&lt;/td&gt;
&lt;td&gt;Stop and negotiate or build API access.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most workflows should fail into the API path at check one or two. Computer use should survive only when both structured options are unavailable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What structured APIs look like in an agent
&lt;/h2&gt;

&lt;p&gt;Here is a simplified version of a “fetch yesterday’s failed payments” workflow using a structured tool.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list_failed_payments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;List failed payments in a date range&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Show yesterday&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s failed payments.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;payments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stripe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PaymentIntent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;created&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gte&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lte&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent never opens the Stripe dashboard. It produces structured arguments, your runtime validates them, and your backend makes the request.&lt;/p&gt;

&lt;p&gt;The computer-use version would need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open a browser.&lt;/li&gt;
&lt;li&gt;Log into Stripe.&lt;/li&gt;
&lt;li&gt;Screenshot the dashboard.&lt;/li&gt;
&lt;li&gt;Click the date picker.&lt;/li&gt;
&lt;li&gt;Screenshot again.&lt;/li&gt;
&lt;li&gt;Select the date range.&lt;/li&gt;
&lt;li&gt;Screenshot again.&lt;/li&gt;
&lt;li&gt;Find the failed status filter.&lt;/li&gt;
&lt;li&gt;Screenshot again.&lt;/li&gt;
&lt;li&gt;Extract table data from the UI.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is slower, more expensive, and more fragile.&lt;/p&gt;

&lt;h2&gt;
  
  
  Designing the structured path with Apidog
&lt;/h2&gt;

&lt;p&gt;Teams often reach for computer use because nobody has designed a clean tool surface for the agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; gives you a practical workflow for turning agent actions into documented API contracts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Model the operations as endpoints
&lt;/h3&gt;

&lt;p&gt;Start with the operations the agent actually needs.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;POST /invoices/search
POST /deals/update-stage
POST /messages/send
POST /reports/failed-payments
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each endpoint should have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a clear operation name&lt;/li&gt;
&lt;li&gt;a narrow request body&lt;/li&gt;
&lt;li&gt;explicit required fields&lt;/li&gt;
&lt;li&gt;predictable JSON responses&lt;/li&gt;
&lt;li&gt;validation rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A small set of focused endpoints can replace most browser-agent demos.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Export the OpenAPI document
&lt;/h3&gt;

&lt;p&gt;Apidog can generate an OpenAPI 3.1 document from the design view.&lt;/p&gt;

&lt;p&gt;That document becomes the contract between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the model&lt;/li&gt;
&lt;li&gt;your agent runtime&lt;/li&gt;
&lt;li&gt;your backend&lt;/li&gt;
&lt;li&gt;your tests&lt;/li&gt;
&lt;li&gt;your docs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 3: Feed the schema into your agent framework
&lt;/h3&gt;

&lt;p&gt;Common agent runtimes can consume structured tool schemas.&lt;/p&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI &lt;code&gt;tools&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Anthropic tool use&lt;/li&gt;
&lt;li&gt;LangChain OpenAPI loaders&lt;/li&gt;
&lt;li&gt;DeepSeek tool-calling endpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the model has the schema, it can call typed functions instead of navigating a UI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Turn on the mock server
&lt;/h3&gt;

&lt;p&gt;Use Apidog’s mock server before connecting the agent to production.&lt;/p&gt;

&lt;p&gt;The mock server lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;test tool selection&lt;/li&gt;
&lt;li&gt;validate request bodies&lt;/li&gt;
&lt;li&gt;simulate success and error responses&lt;/li&gt;
&lt;li&gt;run the agent end-to-end&lt;/li&gt;
&lt;li&gt;avoid spending credits on live workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the same pattern covered in &lt;a href="http://apidog.com/blog/api-tool-contract-first-development?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog’s contract-first development guide&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Replay and debug traffic
&lt;/h3&gt;

&lt;p&gt;When the agent runs, inspect the requests and responses.&lt;/p&gt;

&lt;p&gt;Look for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;missing required fields&lt;/li&gt;
&lt;li&gt;invalid enum values&lt;/li&gt;
&lt;li&gt;wrong endpoint selection&lt;/li&gt;
&lt;li&gt;malformed dates&lt;/li&gt;
&lt;li&gt;unexpected retries&lt;/li&gt;
&lt;li&gt;fallback to browser use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Replay a passing run next to a failing run to find where the tool call drifted.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Ship the API contract
&lt;/h3&gt;

&lt;p&gt;The same Apidog project can support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;public API docs&lt;/li&gt;
&lt;li&gt;internal tool docs&lt;/li&gt;
&lt;li&gt;mocks&lt;/li&gt;
&lt;li&gt;QA&lt;/li&gt;
&lt;li&gt;request replay&lt;/li&gt;
&lt;li&gt;agent debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That turns the agent tool surface into a maintainable product surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hybrid architecture: use both paths intentionally
&lt;/h2&gt;

&lt;p&gt;Most production agents end up hybrid.&lt;/p&gt;

&lt;p&gt;A reasonable default:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;90% of operations use structured API tools.&lt;/li&gt;
&lt;li&gt;10% fall back to computer use for legacy portals.&lt;/li&gt;
&lt;li&gt;A router decides which path to use.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A minimal router rule can be as simple as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If the requested operation exists in known_tools, call the structured tool.
If no matching tool exists, hand off to the browser agent.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In code, that logic might look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;KNOWN_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list_failed_payments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update_deal_stage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send_slack_message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;create_invoice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_operation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;operation_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;operation_name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;KNOWN_TOOLS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;structured_api&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;computer_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Anthropic Claude 4.5, OpenAI GPT-5.5, and DeepSeek V4 can follow this routing pattern. For DeepSeek request examples, see &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to use DeepSeek V4 API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Track both paths separately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request volume&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;token cost&lt;/li&gt;
&lt;li&gt;failure rate&lt;/li&gt;
&lt;li&gt;retry count&lt;/li&gt;
&lt;li&gt;fallback frequency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the browser path starts handling common operations, add the missing endpoint to your tool surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes to avoid
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Skipping the schema
&lt;/h3&gt;

&lt;p&gt;Do not rely on prose-only system prompts for tool calls.&lt;/p&gt;

&lt;p&gt;Use JSON Schema with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;required fields&lt;/li&gt;
&lt;li&gt;enums&lt;/li&gt;
&lt;li&gt;formats&lt;/li&gt;
&lt;li&gt;descriptions&lt;/li&gt;
&lt;li&gt;examples where useful&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Strict schemas improve tool accuracy and make validation failures easy to catch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Letting the agent design schemas at runtime
&lt;/h3&gt;

&lt;p&gt;A schema is product surface. Do not let the model invent it dynamically.&lt;/p&gt;

&lt;p&gt;Author the schema in Apidog, version it, review it, and treat breaking changes like API changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logging tokens but not actual cost
&lt;/h3&gt;

&lt;p&gt;Computer-use tokens often hide in image inputs. Many dashboards display text tokens clearly but price image tokens differently.&lt;/p&gt;

&lt;p&gt;Use your provider’s billing console to validate real cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Confusing computer use with RPA
&lt;/h3&gt;

&lt;p&gt;RPA tools run scripted clicks against known selectors or DOM elements.&lt;/p&gt;

&lt;p&gt;Computer-use agents re-decide what to click from screenshots on every step.&lt;/p&gt;

&lt;p&gt;RPA is cheaper and more repeatable when the UI is stable. Computer use is more flexible but more expensive.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ignoring latency
&lt;/h3&gt;

&lt;p&gt;A 45x token bill is only part of the problem.&lt;/p&gt;

&lt;p&gt;A 60-second browser loop can kick users out of flow. If a user is waiting, use an API whenever possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternatives before full computer use
&lt;/h2&gt;

&lt;p&gt;If a vendor has no public API, try these options before handing the workflow to a screenshot loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Headless browser scripts
&lt;/h3&gt;

&lt;p&gt;Playwright and Puppeteer cost nothing per run after development.&lt;/p&gt;

&lt;p&gt;Tradeoff: they break when the UI changes.&lt;/p&gt;

&lt;p&gt;Use them when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the workflow is high-volume&lt;/li&gt;
&lt;li&gt;the UI is stable&lt;/li&gt;
&lt;li&gt;selectors are reliable&lt;/li&gt;
&lt;li&gt;maintenance cost is acceptable&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Vendor-published iPaaS connectors
&lt;/h3&gt;

&lt;p&gt;Zapier, Make, and similar platforms may already support the vendor.&lt;/p&gt;

&lt;p&gt;Use them when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;speed matters&lt;/li&gt;
&lt;li&gt;the connector covers your workflow&lt;/li&gt;
&lt;li&gt;the seat cost is lower than custom integration work&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Private JSON endpoints
&lt;/h3&gt;

&lt;p&gt;Many dashboards call internal JSON APIs from the browser.&lt;/p&gt;

&lt;p&gt;You can inspect the network tab in DevTools, identify the private endpoint, and wrap it with your own server-side adapter.&lt;/p&gt;

&lt;p&gt;Document that adapter in Apidog and treat it as semi-stable. This pattern also appears in &lt;a href="http://apidog.com/blog/api-testing-without-postman-2026?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;API testing without Postman&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Computer use should be the last resort, not the default.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-world use cases
&lt;/h2&gt;

&lt;p&gt;A fintech compliance team replaced a six-step computer-use Stripe report with three structured calls. Token cost dropped 92%, and runtime went from 41 seconds to 2 seconds.&lt;/p&gt;

&lt;p&gt;A B2B SaaS support agent kept computer use for one workflow: a vendor procurement portal with no API. Everything else routed through OpenAPI tool calls designed in Apidog. Monthly token spend dropped from $4,200 to $310.&lt;/p&gt;

&lt;p&gt;A solo founder used computer use once per week to refresh a Notion dashboard from a legacy ERP. The 45x cost on a weekly run was only a few cents. Building a full integration would have taken weeks. That is a good fit for computer use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The 45x cost gap is real enough to change your default architecture.&lt;/p&gt;

&lt;p&gt;Use structured APIs designed in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; for workflows with stable endpoints. Use computer use only when no API exists and the workflow runs rarely enough that the extra token cost is acceptable.&lt;/p&gt;

&lt;p&gt;Five practical takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Computer use often costs 30x to 50x more tokens than an equivalent structured API call.&lt;/li&gt;
&lt;li&gt;A documented endpoint plus JSON Schema beats a screenshot loop on cost, latency, and reliability.&lt;/li&gt;
&lt;li&gt;Hybrid stacks are normal: design the common path in Apidog and fall back to computer use for legacy workflows.&lt;/li&gt;
&lt;li&gt;Mock the structured tool surface before connecting it to production.&lt;/li&gt;
&lt;li&gt;Track structured calls and browser-agent calls separately so cost drift is visible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next step: open &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;, create a project for your agent’s tool surface, and turn on the mock server. Within an hour, you should know whether your browser workflow can become two structured calls instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is computer use ever cheaper than a structured API?
&lt;/h3&gt;

&lt;p&gt;Not per run. Screenshot tokens dominate.&lt;/p&gt;

&lt;p&gt;Computer use can be cheaper in total only when integration cost would exceed years of run cost. That usually means a very low-volume workflow against a system with no API.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I mock a JSON tool surface for an agent?
&lt;/h3&gt;

&lt;p&gt;Design the endpoints in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;, turn on the built-in mock server, and point your agent at the mock URL.&lt;/p&gt;

&lt;p&gt;Every request returns realistic JSON without hitting production. For a related workflow, see &lt;a href="http://apidog.com/blog/api-testing-tool-qa-engineers?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;API testing tools for QA engineers&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use OpenAPI for tool calls in any model?
&lt;/h3&gt;

&lt;p&gt;Yes. OpenAI &lt;code&gt;tools&lt;/code&gt;, Anthropic &lt;code&gt;tool_use&lt;/code&gt;, and DeepSeek V4 tool-calling endpoints can consume OpenAPI 3.1-style schemas.&lt;/p&gt;

&lt;p&gt;Apidog exports the schema cleanly. See &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;how to use DeepSeek V4 API&lt;/a&gt; for the DeepSeek request shape.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does GPT-5.5 still support computer use?
&lt;/h3&gt;

&lt;p&gt;OpenAI ships computer use through Operator and the Responses API. The cost profile is similar to Anthropic’s screenshot-based approach. The recommendation here applies regardless of vendor.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about Skyvern, browser-use, and other open-source agents?
&lt;/h3&gt;

&lt;p&gt;The same math applies.&lt;/p&gt;

&lt;p&gt;Open-source browser agents may reduce per-call price by using cheaper models, but they still require multiple rounds and screenshots. Structured APIs still win where APIs exist.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I know when an endpoint is missing for an agent task?
&lt;/h3&gt;

&lt;p&gt;Watch for repeated fallback to browser use.&lt;/p&gt;

&lt;p&gt;If the agent keeps trying to use a browser for the same operation, that is a missing endpoint in your tool surface. Add it in Apidog, regenerate the schema, and route the agent back through structured calls.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>TradingAgents:Open-Source LLM Trading Framework</title>
      <dc:creator>Hassann</dc:creator>
      <pubDate>Thu, 07 May 2026 03:59:19 +0000</pubDate>
      <link>https://forem.com/hassann/tradingagentsopen-source-llm-trading-framework-394e</link>
      <guid>https://forem.com/hassann/tradingagentsopen-source-llm-trading-framework-394e</guid>
      <description>&lt;p&gt;Most multi-agent LLM frameworks promise more than they deliver. &lt;a href="https://github.com/TauricResearch/TradingAgents" rel="noopener noreferrer"&gt;TradingAgents&lt;/a&gt; is one of the rare exceptions: open-sourced by Tauric Research alongside an &lt;a href="https://arxiv.org/abs/2412.20138" rel="noopener noreferrer"&gt;arXiv paper&lt;/a&gt;, now at version 0.2.4, and built around a clean role decomposition that mirrors a real research desk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This guide focuses on what TradingAgents does, what changed in v0.2.4, how its agent architecture works, and how to test the LLM and market-data layers underneath with &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;. If you are already thinking about agent contracts, pair this with the &lt;a href="http://apidog.com/blog/how-to-write-agents-md-files?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;agents.md guide for API teams&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;TradingAgents is a multi-agent LLM trading framework from Tauric Research, described in &lt;a href="https://arxiv.org/abs/2412.20138" rel="noopener noreferrer"&gt;arXiv 2412.20138&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;It decomposes trading into specialist agents: Fundamentals Analyst, Sentiment Analyst, News Analyst, Technical Analyst, Bull/Bear Researchers, Trader, and Risk Management agents.&lt;/li&gt;
&lt;li&gt;v0.2.4 adds structured-output agents, LangGraph checkpoint resume, persistent decision logs, Docker support, and more LLM providers.&lt;/li&gt;
&lt;li&gt;It can run against OpenAI-compatible endpoints, which makes hosted, local, and self-hosted models easier to swap.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; to mock market-data APIs, replay LLM traffic, assert structured output, and compare provider behavior.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apidog.com/download?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Download Apidog&lt;/a&gt; if you want to wire these checks into CI before trusting agent output.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What TradingAgents is
&lt;/h2&gt;

&lt;p&gt;TradingAgents is a Python package and CLI for running a multi-agent trading research workflow.&lt;/p&gt;

&lt;p&gt;Instead of asking one model to “analyze this stock,” the framework splits the workflow into roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fundamentals Analyst&lt;/li&gt;
&lt;li&gt;Sentiment Analyst&lt;/li&gt;
&lt;li&gt;News Analyst&lt;/li&gt;
&lt;li&gt;Technical Analyst&lt;/li&gt;
&lt;li&gt;Bull Researcher&lt;/li&gt;
&lt;li&gt;Bear Researcher&lt;/li&gt;
&lt;li&gt;Research Manager&lt;/li&gt;
&lt;li&gt;Trader&lt;/li&gt;
&lt;li&gt;Risk Management agents&lt;/li&gt;
&lt;li&gt;Portfolio Manager&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each agent has:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A specific role prompt.&lt;/li&gt;
&lt;li&gt;A focused toolset.&lt;/li&gt;
&lt;li&gt;A place in the workflow graph.&lt;/li&gt;
&lt;li&gt;A defined output consumed by the next stage.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The project README frames it as research code, not investment advice. That distinction matters. The useful engineering lesson is not “let an LLM trade for you.” It is how to design a multi-agent system with specialist roles, debate, structured decisions, and an audit trail.&lt;/p&gt;

&lt;h2&gt;
  
  
  What v0.2.4 shipped
&lt;/h2&gt;

&lt;p&gt;The v0.2.4 release is important because it improves reliability around long-running agent workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structured-output agents
&lt;/h3&gt;

&lt;p&gt;The Research Manager, Trader, and Portfolio Manager now emit structured output through either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI Responses API&lt;/li&gt;
&lt;li&gt;Anthropic tool-use channel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That replaces brittle free-text parsing with typed JSON-style outputs, which makes downstream automation safer.&lt;/p&gt;

&lt;h3&gt;
  
  
  LangGraph checkpoint resume
&lt;/h3&gt;

&lt;p&gt;TradingAgents uses LangGraph for orchestration. v0.2.4 adds checkpoint resume support, so a run can recover from interruptions such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLM provider &lt;code&gt;429&lt;/code&gt; responses&lt;/li&gt;
&lt;li&gt;market-data API throttling&lt;/li&gt;
&lt;li&gt;local process failures&lt;/li&gt;
&lt;li&gt;network issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of restarting the full workflow, you can resume from a saved checkpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Persistent decision log
&lt;/h3&gt;

&lt;p&gt;Trader decisions are written to a SQLite log with reasoning, inputs, and timestamps.&lt;/p&gt;

&lt;p&gt;That gives you an audit trail you can inspect later or use for evaluation.&lt;/p&gt;

&lt;h3&gt;
  
  
  More LLM providers
&lt;/h3&gt;

&lt;p&gt;v0.2.4 added support for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DeepSeek&lt;/li&gt;
&lt;li&gt;Qwen&lt;/li&gt;
&lt;li&gt;GLM&lt;/li&gt;
&lt;li&gt;Azure OpenAI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those join the existing provider matrix that includes OpenAI, Anthropic, Gemini, and Grok.&lt;/p&gt;

&lt;p&gt;If you want to compare cost and reasoning behavior, you can test DeepSeek through its OpenAI-compatible endpoint. The request pattern is covered in the &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 API guide&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker and Windows fixes
&lt;/h3&gt;

&lt;p&gt;The release also includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dockerfile support&lt;/li&gt;
&lt;li&gt;a Windows UTF-8/path encoding fix from v0.2.3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not exciting, but useful if you want repeatable local or CI runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  TradingAgents architecture
&lt;/h2&gt;

&lt;p&gt;A complete TradingAgents run follows this flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The CLI accepts a ticker and date.&lt;/li&gt;
&lt;li&gt;The Analyst Team fans out.&lt;/li&gt;
&lt;li&gt;Each analyst fetches data and writes a report.&lt;/li&gt;
&lt;li&gt;The Bull Researcher writes a bullish thesis.&lt;/li&gt;
&lt;li&gt;The Bear Researcher writes a bearish thesis.&lt;/li&gt;
&lt;li&gt;The researchers debate.&lt;/li&gt;
&lt;li&gt;The Research Manager synthesizes the debate into a recommendation.&lt;/li&gt;
&lt;li&gt;The Trader reads the recommendation and decision history.&lt;/li&gt;
&lt;li&gt;The Trader produces a trade plan.&lt;/li&gt;
&lt;li&gt;Risk Management agents review the plan from aggressive, conservative, and neutral perspectives.&lt;/li&gt;
&lt;li&gt;The Portfolio Manager approves or sends the plan back.&lt;/li&gt;
&lt;li&gt;The final decision is written to SQLite.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The highest LLM cost usually appears in the debate and risk-review stages because multiple agents reason over the same context.&lt;/p&gt;

&lt;p&gt;That is also where smaller models tend to fail. A weak local model may loop, repeat arguments, or produce shallow Bull/Bear debates. Stronger reasoning models generally produce more useful tradeoffs and cleaner structured conclusions.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it compares to LangGraph and CrewAI
&lt;/h2&gt;

&lt;p&gt;TradingAgents is not a general-purpose agent framework in the same way LangGraph or CrewAI is.&lt;/p&gt;

&lt;p&gt;Think of the layers like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph&lt;/strong&gt;: low-level graph orchestration for agent workflows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CrewAI&lt;/strong&gt;: general-purpose role-based multi-agent framework.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TradingAgents&lt;/strong&gt;: domain-specific implementation for trading research.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want maximum flexibility, start with LangGraph.&lt;/p&gt;

&lt;p&gt;If you want a general multi-agent abstraction, evaluate CrewAI.&lt;/p&gt;

&lt;p&gt;If you want to study a concrete, opinionated multi-agent workflow with debate, decision, risk review, and logging, read TradingAgents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why you need to test the API layers
&lt;/h2&gt;

&lt;p&gt;TradingAgents depends on two unstable surfaces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Market-data APIs&lt;/li&gt;
&lt;li&gt;LLM provider APIs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both can break runs in ways that are hard to debug.&lt;/p&gt;

&lt;h3&gt;
  
  
  Market-data APIs fail through drift
&lt;/h3&gt;

&lt;p&gt;Common issues include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inconsistent free-tier rate limits&lt;/li&gt;
&lt;li&gt;renamed fields&lt;/li&gt;
&lt;li&gt;missing fields&lt;/li&gt;
&lt;li&gt;different trading-day boundaries&lt;/li&gt;
&lt;li&gt;different historical-data formats between vendors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A run can work one day and fail the next because a vendor changed a field such as &lt;code&gt;regularMarketTime&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM provider APIs fail through shape and cost
&lt;/h3&gt;

&lt;p&gt;Common issues include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;changed response formats&lt;/li&gt;
&lt;li&gt;tool-call parsing differences&lt;/li&gt;
&lt;li&gt;reasoning-mode cost spikes&lt;/li&gt;
&lt;li&gt;provider-specific structured-output behavior&lt;/li&gt;
&lt;li&gt;token usage that varies by role&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fix is to keep saved, replayable request collections with assertions. That is where &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; fits. The same pattern is useful for protocol-level testing, as described in the &lt;a href="http://apidog.com/blog/mcp-server-testing-apidog?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;MCP server testing playbook&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mock market-data APIs with Apidog
&lt;/h2&gt;

&lt;p&gt;Use this workflow to make TradingAgents test runs deterministic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: define upstream endpoints
&lt;/h3&gt;

&lt;p&gt;Create an Apidog project and add the market-data endpoints TradingAgents calls, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Yahoo Finance&lt;/li&gt;
&lt;li&gt;FinnHub&lt;/li&gt;
&lt;li&gt;Polygon&lt;/li&gt;
&lt;li&gt;OpenBB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For each endpoint, save:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;method&lt;/li&gt;
&lt;li&gt;path&lt;/li&gt;
&lt;li&gt;query parameters&lt;/li&gt;
&lt;li&gt;headers&lt;/li&gt;
&lt;li&gt;example response body&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use real vendor responses as fixtures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: enable the mock server
&lt;/h3&gt;

&lt;p&gt;Turn on Apidog’s mock server and point TradingAgents’ tool configuration at the mock URL.&lt;/p&gt;

&lt;p&gt;The Fundamentals Analyst, Technical Analyst, and other data-consuming agents now receive deterministic data instead of live vendor responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: detect vendor drift
&lt;/h3&gt;

&lt;p&gt;On a schedule, replay the live vendor endpoints and compare their response shapes against your saved fixtures.&lt;/p&gt;

&lt;p&gt;Look for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;renamed fields&lt;/li&gt;
&lt;li&gt;removed fields&lt;/li&gt;
&lt;li&gt;newly required fields&lt;/li&gt;
&lt;li&gt;type changes&lt;/li&gt;
&lt;li&gt;empty values where data previously existed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the same contract-first workflow described in &lt;a href="http://apidog.com/blog/api-tool-contract-first-development?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;contract-first API development&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test the LLM provider layer
&lt;/h2&gt;

&lt;p&gt;Before scaling TradingAgents runs, test three things.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Cost per role
&lt;/h3&gt;

&lt;p&gt;Run a single ticker and capture token usage per agent.&lt;/p&gt;

&lt;p&gt;At minimum, track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fundamentals Analyst tokens&lt;/li&gt;
&lt;li&gt;Sentiment Analyst tokens&lt;/li&gt;
&lt;li&gt;News Analyst tokens&lt;/li&gt;
&lt;li&gt;Technical Analyst tokens&lt;/li&gt;
&lt;li&gt;Bull/Bear debate tokens&lt;/li&gt;
&lt;li&gt;Risk Management tokens&lt;/li&gt;
&lt;li&gt;final decision tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Bull/Bear debate should usually be more expensive than a single analyst pass. If it is not, the model may be short-circuiting the debate.&lt;/p&gt;

&lt;p&gt;Use &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; request logs to capture provider traffic and compare token usage across runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Structured output shape
&lt;/h3&gt;

&lt;p&gt;For v0.2.4 structured-output agents, add assertions that verify required fields exist.&lt;/p&gt;

&lt;p&gt;For example, assert that the Trader output contains fields like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"buy | sell | hold"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.72&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"risk_notes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then add JSONPath checks such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$.action
$.confidence
$.reasoning
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A structured-output regression is dangerous because downstream code may fail only after the model response is already accepted.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Provider parity
&lt;/h3&gt;

&lt;p&gt;When swapping providers, do not compare one run against one run.&lt;/p&gt;

&lt;p&gt;Instead:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select a fixed ticker basket.&lt;/li&gt;
&lt;li&gt;Run the same dates through provider A.&lt;/li&gt;
&lt;li&gt;Run the same dates through provider B.&lt;/li&gt;
&lt;li&gt;Compare the SQLite decision logs.&lt;/li&gt;
&lt;li&gt;Measure how often conclusions diverge.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenAI vs DeepSeek
30 tickers
2 debate rounds
same market-data fixtures
same date range
compare final action + confidence + reasoning summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use the &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 API guide&lt;/a&gt; and the &lt;a href="http://apidog.com/blog/how-to-use-gpt-5-5-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;GPT-5.5 API guide&lt;/a&gt; for provider request patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Minimal TradingAgents run
&lt;/h2&gt;

&lt;p&gt;A basic run looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/TauricResearch/TradingAgents
&lt;span class="nb"&gt;cd &lt;/span&gt;TradingAgents
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk-..."&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;FINNHUB_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;

python &lt;span class="nt"&gt;-m&lt;/span&gt; tradingagents.cli &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ticker&lt;/span&gt; AAPL &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--date&lt;/span&gt; 2026-04-30 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--models&lt;/span&gt; gpt-5.5 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--rounds&lt;/span&gt; 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two debate rounds are a practical minimum for testing the Bull/Bear workflow.&lt;/p&gt;

&lt;p&gt;The output is written under:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tradingagents/results/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expect JSON artifacts plus a Markdown decision summary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Swap to DeepSeek
&lt;/h2&gt;

&lt;p&gt;To test a different reasoning provider, configure the provider and model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DEEPSEEK_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk-..."&lt;/span&gt;

python &lt;span class="nt"&gt;-m&lt;/span&gt; tradingagents.cli &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ticker&lt;/span&gt; AAPL &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--date&lt;/span&gt; 2026-04-30 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--models&lt;/span&gt; deepseek-v4-pro &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--provider&lt;/span&gt; deepseek &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--rounds&lt;/span&gt; 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same pattern applies to Qwen, GLM, or local OpenAI-compatible servers such as Ollama or vLLM.&lt;/p&gt;

&lt;p&gt;For local model options, see the &lt;a href="http://apidog.com/blog/best-local-llms-2026?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;best local LLMs of 2026 post&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common pitfalls
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Running with a model that is too small
&lt;/h3&gt;

&lt;p&gt;Small local models can produce repetitive Bull/Bear debates that never converge.&lt;/p&gt;

&lt;p&gt;For serious evaluation, use at least a mid-tier reasoning model. The original article identifies DeepSeek V4 Flash, Qwen 3.6 32B, GPT-5.5, and Claude 4.5 as realistic options.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skipping market-data caching
&lt;/h3&gt;

&lt;p&gt;Each analyst can call the data layer separately. Without caching, one ticker run can fan out into multiple vendor requests.&lt;/p&gt;

&lt;p&gt;Enable caching before running batches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Treating research code as a trading bot
&lt;/h3&gt;

&lt;p&gt;TradingAgents is research code. Backtest results are sensitive to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model choice&lt;/li&gt;
&lt;li&gt;prompt seed&lt;/li&gt;
&lt;li&gt;debate length&lt;/li&gt;
&lt;li&gt;data quality&lt;/li&gt;
&lt;li&gt;provider behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Treat outputs as hypotheses, not executable trading strategies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Not logging token spend
&lt;/h3&gt;

&lt;p&gt;A single ticker run can cost anywhere from cents to several dollars depending on model and debate rounds.&lt;/p&gt;

&lt;p&gt;Track per-run cost in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog’s&lt;/a&gt; replay history so debate loops do not silently burn budget.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardcoding one provider
&lt;/h3&gt;

&lt;p&gt;The framework supports multiple providers. Use that to your advantage.&lt;/p&gt;

&lt;p&gt;Before committing to one provider:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run the same ticker set through several models.&lt;/li&gt;
&lt;li&gt;Compare decision logs.&lt;/li&gt;
&lt;li&gt;Compare token cost.&lt;/li&gt;
&lt;li&gt;Review failure modes.&lt;/li&gt;
&lt;li&gt;Pick based on both cost and behavior.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where Apidog fits in the development loop
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Design the API surface
&lt;/h3&gt;

&lt;p&gt;Before wiring TradingAgents to live vendors, model each market-data endpoint in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That forces you to identify which response fields the agents actually need.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run local CI against mocks
&lt;/h3&gt;

&lt;p&gt;Use Apidog’s mock server for unit and integration tests.&lt;/p&gt;

&lt;p&gt;That keeps tests independent of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vendor uptime&lt;/li&gt;
&lt;li&gt;market hours&lt;/li&gt;
&lt;li&gt;rate limits&lt;/li&gt;
&lt;li&gt;network failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The same workflow is covered in &lt;a href="http://apidog.com/blog/api-testing-without-postman-2026?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;API testing without Postman&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Diff live responses against fixtures
&lt;/h3&gt;

&lt;p&gt;Schedule a weekly replay of live vendor endpoints.&lt;/p&gt;

&lt;p&gt;Compare the live response shape against saved fixtures and alert on schema drift. This gives you an early warning when the data layer changes underneath the agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this pattern matters beyond trading
&lt;/h2&gt;

&lt;p&gt;TradingAgents is useful even if you never build trading software.&lt;/p&gt;

&lt;p&gt;The architecture transfers to other multi-step agent workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;customer support triage&lt;/li&gt;
&lt;li&gt;code review&lt;/li&gt;
&lt;li&gt;compliance review&lt;/li&gt;
&lt;li&gt;research summarization&lt;/li&gt;
&lt;li&gt;incident analysis&lt;/li&gt;
&lt;li&gt;security review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reusable pattern is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;specialist agents -&amp;gt; debate/review -&amp;gt; synthesis -&amp;gt; decision -&amp;gt; audit log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That structure is easier to test than a single large prompt because each stage has a defined responsibility and output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-world examples
&lt;/h2&gt;

&lt;p&gt;A quant research student can run the same 30-ticker basket through DeepSeek V4, GPT-5.5, and Claude 4.5, then use Apidog logs to compare request/response behavior.&lt;/p&gt;

&lt;p&gt;A fintech engineer can reuse the multi-agent pattern for code reviews: security agent, performance agent, style agent, then a synthesizer that writes the final PR comment.&lt;/p&gt;

&lt;p&gt;A solo developer can run TradingAgents nightly on a 10-ticker watchlist and log every decision into a database while using Apidog mocks for weekend test runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;TradingAgents is a practical reference implementation for multi-agent LLM workflows. It uses specialist roles, debate, risk review, structured decisions, and persistent logs instead of a single monolithic prompt.&lt;/p&gt;

&lt;p&gt;v0.2.4 makes the project more useful for production-style experimentation with structured outputs, checkpoint resume, SQLite decision logs, Docker support, and broader provider coverage.&lt;/p&gt;

&lt;p&gt;The key implementation lesson: test the layers underneath the agents.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mock market-data vendors in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Assert structured LLM outputs.&lt;/li&gt;
&lt;li&gt;Log token cost by role.&lt;/li&gt;
&lt;li&gt;Compare providers with repeatable fixtures.&lt;/li&gt;
&lt;li&gt;Treat final decisions as research artifacts, not trading instructions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next step: clone the repo, run one ticker, and route the upstream calls through an &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt; mock server. You should know within an hour whether the architecture fits your workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is TradingAgents safe to use with real money?
&lt;/h3&gt;

&lt;p&gt;The repo describes TradingAgents as research code, not financial advice. Treat its output as a hypothesis. Running it against a live brokerage is your own risk.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which LLM provider gives the best cost-quality tradeoff?
&lt;/h3&gt;

&lt;p&gt;The original article identifies DeepSeek V4 Flash with thinking mode as a strong cost-quality option for early 2026 workloads. See the &lt;a href="http://apidog.com/blog/how-to-use-deepseek-v4-api?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;DeepSeek V4 API guide&lt;/a&gt; for request details.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I run TradingAgents on local models?
&lt;/h3&gt;

&lt;p&gt;Yes. Multi-provider support allows OpenAI-compatible local endpoints from tools such as Ollama, vLLM, and LM Studio. See the &lt;a href="http://apidog.com/blog/best-local-llms-2026?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;best local LLMs of 2026 post&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I mock market-data APIs?
&lt;/h3&gt;

&lt;p&gt;Define each vendor endpoint in &lt;a href="https://apidog.com?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;Apidog&lt;/a&gt;, enable the mock server, and point TradingAgents’ tool config at the mock URL. The same pattern is covered in &lt;a href="http://apidog.com/blog/api-testing-tool-qa-engineers?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation"&gt;API testing tools for QA engineers&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What hardware do I need?
&lt;/h3&gt;

&lt;p&gt;If you call hosted LLMs such as OpenAI, Anthropic, or DeepSeek, any laptop with Python 3.10+ should be enough.&lt;/p&gt;

&lt;p&gt;If you serve local models, hardware depends on model size. Larger reasoning models need substantially more GPU memory than small local models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does it support after-hours and weekend simulation?
&lt;/h3&gt;

&lt;p&gt;TradingAgents can run against historical data for a selected date. Live trading is a separate problem that the framework does not claim to solve.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it compare to other multi-agent frameworks?
&lt;/h3&gt;

&lt;p&gt;TradingAgents is domain-specific. CrewAI, AutoGen, and LangGraph are general-purpose. Use TradingAgents to study a concrete multi-agent implementation; use LangGraph or another general framework when you need to build your own agent graph from scratch.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
