<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Evan-dong</title>
    <description>The latest articles on Forem by Evan-dong (@evan-dong).</description>
    <link>https://forem.com/evan-dong</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3805708%2F6a9f71a4-d7de-4c0a-8ff7-ba23c9b2486a.png</url>
      <title>Forem: Evan-dong</title>
      <link>https://forem.com/evan-dong</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/evan-dong"/>
    <language>en</language>
    <item>
      <title>I Generated a Full Brand Mockup Set from a Logo in 10 Minutes — Here's the Workflow</title>
      <dc:creator>Evan-dong</dc:creator>
      <pubDate>Sat, 23 May 2026 12:02:18 +0000</pubDate>
      <link>https://forem.com/evan-dong/i-generated-a-full-brand-mockup-set-from-a-logo-in-10-minutes-heres-the-workflow-3ald</link>
      <guid>https://forem.com/evan-dong/i-generated-a-full-brand-mockup-set-from-a-logo-in-10-minutes-heres-the-workflow-3ald</guid>
      <description>&lt;h1&gt;
  
  
  I Generated a Full Brand Mockup Set from a Logo in 10 Minutes — Here's the Workflow
&lt;/h1&gt;

&lt;p&gt;Every time I finish a logo, the same problem comes back: how do I show what this brand actually looks like in the real world? Clients don't react to a mark on a white background. They want to see signage, cards, packaging, booths, and product surfaces.&lt;/p&gt;

&lt;p&gt;That part usually takes half a day of mockup sourcing and layout work. I wanted to see if Image 2 could compress that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who this is for:&lt;/strong&gt; designers, brand consultants, or developers building brand-adjacent tools who want a faster mockup workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The framework
&lt;/h2&gt;

&lt;p&gt;I split the output into six categories before generating anything:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Logo material and finish studies&lt;/li&gt;
&lt;li&gt;Single branded item mockups&lt;/li&gt;
&lt;li&gt;Combined brand material sets&lt;/li&gt;
&lt;li&gt;Spatial and environmental scenes&lt;/li&gt;
&lt;li&gt;Human or usage scenarios&lt;/li&gt;
&lt;li&gt;Social media visuals&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The planning prompt
&lt;/h2&gt;

&lt;p&gt;Once you have the categories, use this prompt to generate a plan:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I already have a logo. The industry is [industry]. The brand personality is [keywords]. Based on the uploaded logo, help me plan a complete logo mockup generation set. Include: 1) logo material variations, 2) single brand applications, 3) combined brand materials, 4) spatial scenes, 5) human or usage scenarios, and 6) social media visuals. For each category, provide the recommended visual direction, aspect ratio, and prompts that can be used directly in Image 2. Keep the logo recognizable, choose materials appropriate to the industry and brand personality, and make the outputs suitable for a brand proposal.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Test 1: Evolink (tech/API brand)
&lt;/h2&gt;

&lt;p&gt;Brand inputs: professional, stable, developer-friendly, enterprise-ready, connected.&lt;/p&gt;

&lt;p&gt;Image 2 generated logo finishes, interface materials, developer badges, documentation covers, and booth-style spatial visuals.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F77cf3e009e7494cf8460a94e86013a41085460e5adb7efe5a4be8cf7a2144189" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F77cf3e009e7494cf8460a94e86013a41085460e5adb7efe5a4be8cf7a2144189" alt="Evolink logo mockup direction" width="720" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fcb93a6bb8caba4a62a34828089b4d0874980bdc727caca6f87e9cb295bd1324a" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fcb93a6bb8caba4a62a34828089b4d0874980bdc727caca6f87e9cb295bd1324a" alt="Evolink branded material mockup" width="760" height="1140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F08a06d4457f482e2fbaa21ade3750d0f8b289f3ba4907e1c81c0e4c422a8d7f2" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F08a06d4457f482e2fbaa21ade3750d0f8b289f3ba4907e1c81c0e4c422a8d7f2" alt="Evolink high-resolution brand scene" width="1280" height="1706"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fd63947ecbf13922cfc89257b57249825488beb4e55d57c72796dc73d8f0e89ba" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fd63947ecbf13922cfc89257b57249825488beb4e55d57c72796dc73d8f0e89ba" alt="Evolink product and brand environment" width="720" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F46579df9da91014d13292beb16e7fb615ed8322fb86b2af021fb5a950baef18c" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F46579df9da91014d13292beb16e7fb615ed8322fb86b2af021fb5a950baef18c" alt="Evolink booth concept" width="720" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fa489ac910f637f33c64878c79a2cad104538b846ae48256c535649238f2a7bac" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Fa489ac910f637f33c64878c79a2cad104538b846ae48256c535649238f2a7bac" alt="Evolink brand application scene" width="720" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Ff0bef8c0cdec0c8eda9b295d1788e93d7ee61496daf2e064be3014207c1ddf0e" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2Ff0bef8c0cdec0c8eda9b295d1788e93d7ee61496daf2e064be3014207c1ddf0e" alt="Evolink card or badge application" width="720" height="900"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After one round, Evolink stopped looking like just a geometric mark. It started to feel like a real company with booths, dashboards, documentation, and team assets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test 2: MoriJoy (nature park brand)
&lt;/h2&gt;

&lt;p&gt;To see if this holds outside tech, I tried MoriJoy — a parent-child nature park. Logo: smiling treehouse + sapling. Palette: wood tones, cream, light green. Personality: soft, warm, healing, natural, playful, family-oriented.&lt;/p&gt;

&lt;p&gt;The mockup directions shifted completely: entrance signage, membership cards, staff name tags, children's drinkware, sticker sets, tote bags, family activity spaces.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F54d95fd065a77840bdb62a600cb0f876f11c795eebb0caee098586e200fbecf8" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F54d95fd065a77840bdb62a600cb0f876f11c795eebb0caee098586e200fbecf8" alt="MoriJoy entrance and environment" width="720" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F6e784147e8b1c2d7777a60ec060f5722a5052a161639b0d89a287b694d38aa43" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F6e784147e8b1c2d7777a60ec060f5722a5052a161639b0d89a287b694d38aa43" alt="MoriJoy branded item mockup" width="720" height="900"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F6b8a7e353df0985dac86f61230afb0bdfa0a2a706825b42d177031ecf6f429fb" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.gooo.ai%2Fweb-images%2F6b8a7e353df0985dac86f61230afb0bdfa0a2a706825b42d177031ecf6f429fb" alt="MoriJoy spatial brand scene" width="720" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same workflow, completely different visual language. The system adapts the brand world around the logo.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The six-part framework prevents random outputs and keeps the set coherent.&lt;/li&gt;
&lt;li&gt;Defining brand personality keywords before generating makes a big difference.&lt;/li&gt;
&lt;li&gt;Not every output is perfect — you still need design judgment to pick the best directions.&lt;/li&gt;
&lt;li&gt;This works best for early-stage proposals and direction-setting, not final production.&lt;/li&gt;
&lt;li&gt;The workflow compresses what normally takes half a day into about 10 minutes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;If you're building brand proposals or need to show a logo in context quickly, this workflow is worth testing.&lt;/p&gt;

&lt;p&gt;Upload your logo and generate a full brand mockup set with Image 2: &lt;a href="https://evolink.ai/gpt-image-2?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=image2_brand_mockup&amp;amp;utm_content=image2-brand-mockup-devto" rel="noopener noreferrer"&gt;https://evolink.ai/gpt-image-2&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>nanobanana</category>
    </item>
    <item>
      <title>Gemini 3.5 Flash Just Shipped — Here's When to Use It (and When Not To)</title>
      <dc:creator>Evan-dong</dc:creator>
      <pubDate>Wed, 20 May 2026 10:33:55 +0000</pubDate>
      <link>https://forem.com/evan-dong/gemini-35-flash-just-shipped-heres-when-to-use-it-and-when-not-to-1aee</link>
      <guid>https://forem.com/evan-dong/gemini-35-flash-just-shipped-heres-when-to-use-it-and-when-not-to-1aee</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmuyutechnology.mintlify.app%2Fmintlify-assets%2F_next%2Fimage%3Furl%3D%252F_mintlify%252Fapi%252Fog%253Fdivision%253DGoogle%252BNative%252BAPI%252BFormat%2526title%253DGemini%252B3.5%252BFlash%252B-%252BNative%252BAPI%252B-%252BQuick%252BStart%2526description%253D-%252BUse%252BGoogle%252BNative%252BAPI%252Bformat%252Bto%252Bcall%252Bgemini-3.5-flash%252Bmodel%25250A-%252BSynchronous%252Bprocessing%252Bmode%25252C%252Breal-time%252Bresponse%25250A-%252BMinimal%252Bparameters%252Bfor%252Bquick%252Bstart%25250A-%252B%2525F0%25259F%252592%2525A1%252BNeed%252Bm%2526primaryColor%253D%2525231E90FF%2526lightColor%253D%2525231E90FF%2526darkColor%253D%2525231565C0%2526backgroundLight%253D%252523ffffff%2526backgroundDark%253D%252523090c10%26w%3D1200%26q%3D100" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmuyutechnology.mintlify.app%2Fmintlify-assets%2F_next%2Fimage%3Furl%3D%252F_mintlify%252Fapi%252Fog%253Fdivision%253DGoogle%252BNative%252BAPI%252BFormat%2526title%253DGemini%252B3.5%252BFlash%252B-%252BNative%252BAPI%252B-%252BQuick%252BStart%2526description%253D-%252BUse%252BGoogle%252BNative%252BAPI%252Bformat%252Bto%252Bcall%252Bgemini-3.5-flash%252Bmodel%25250A-%252BSynchronous%252Bprocessing%252Bmode%25252C%252Breal-time%252Bresponse%25250A-%252BMinimal%252Bparameters%252Bfor%252Bquick%252Bstart%25250A-%252B%2525F0%25259F%252592%2525A1%252BNeed%252Bm%2526primaryColor%253D%2525231E90FF%2526lightColor%253D%2525231E90FF%2526darkColor%253D%2525231565C0%2526backgroundLight%253D%252523ffffff%2526backgroundDark%253D%252523090c10%26w%3D1200%26q%3D100" alt="Gemini 3.5 Flash Native API quick start" width="1200" height="630"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Google launched Gemini 3.5 Flash at I/O 2026 today. The "budget" model now beats Gemini 3.1 Pro on agent and coding benchmarks. Here is what you actually need to know to decide whether to switch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick specs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model ID:&lt;/strong&gt; &lt;code&gt;gemini-3.5-flash&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context:&lt;/strong&gt; 1M input / 65K output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input:&lt;/strong&gt; text, image, audio, video, PDF&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing:&lt;/strong&gt; $1.50/M input, $9.00/M output, $0.15/M cached input&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge cutoff:&lt;/strong&gt; January 2026&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Thinking:&lt;/strong&gt; on by default (medium), low mode available&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where Flash wins
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Flash&lt;/th&gt;
&lt;th&gt;3.1 Pro&lt;/th&gt;
&lt;th&gt;Gap&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Terminal-Bench 2.1 (coding)&lt;/td&gt;
&lt;td&gt;76.2%&lt;/td&gt;
&lt;td&gt;70.3%&lt;/td&gt;
&lt;td&gt;+5.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP Atlas (tool calling)&lt;/td&gt;
&lt;td&gt;83.6%&lt;/td&gt;
&lt;td&gt;78.2%&lt;/td&gt;
&lt;td&gt;+5.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Finance Agent v2&lt;/td&gt;
&lt;td&gt;57.9%&lt;/td&gt;
&lt;td&gt;43.0%&lt;/td&gt;
&lt;td&gt;+14.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GDPval-AA (decision-making)&lt;/td&gt;
&lt;td&gt;1,656 Elo&lt;/td&gt;
&lt;td&gt;1,314 Elo&lt;/td&gt;
&lt;td&gt;+342&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CharXiv Reasoning&lt;/td&gt;
&lt;td&gt;84.2%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Where Flash does NOT win
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Humanity's Last Exam:&lt;/strong&gt; 3.1 Pro leads (44.4% vs 40.2%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ARC-AGI-2:&lt;/strong&gt; 3.1 Pro leads (77.1% vs 72.1%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SWE-Bench Pro:&lt;/strong&gt; Claude Opus 4.7 still leads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Computer Use:&lt;/strong&gt; GPT-5.5 is the only model with production screen control. Flash does not support this.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tool capabilities
&lt;/h2&gt;

&lt;p&gt;What ships:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Function Calling&lt;/li&gt;
&lt;li&gt;Structured Output&lt;/li&gt;
&lt;li&gt;Search-as-a-tool&lt;/li&gt;
&lt;li&gt;Code Execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What is missing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Computer Use&lt;/strong&gt; — no screen control, no clicking, no form filling. If your agent operates a browser or desktop app, you still need GPT-5.5 for that part.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to pick which model
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent needs to call multiple tools in sequence?
  → Gemini 3.5 Flash

Agent needs to refactor code across a large repo?
  → Claude Opus 4.7

Agent needs to control a browser or desktop?
  → GPT-5.5

Need all three depending on the task?
  → Route through a unified gateway
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  API call example
&lt;/h2&gt;

&lt;p&gt;Through EvoLink (single endpoint for Gemini, Claude, and GPT):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://direct.evolink.ai/v1beta/models/gemini-3.5-flash:generateContent &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer YOUR_EVOLINK_KEY"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "contents": [{
      "role": "user",
      "parts": [{"text": "List the top 3 files changed in the last commit and explain what each change does"}]
    }]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you were using any previous Gemini model through EvoLink, swap the model ID. No other change needed.&lt;/p&gt;

&lt;p&gt;The benefit: same API key and endpoint for Flash, Opus 4.7, and GPT-5.5. Switch model IDs depending on the task. One bill, automatic failover.&lt;/p&gt;

&lt;p&gt;Full docs: &lt;a href="https://docs.evolink.ai/en/api-manual/language-series/gemini-3.5-flash/native-api/native-api-quickstart?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=gemini35flash&amp;amp;utm_content=gemini-35-flash-agent-model-devto" rel="noopener noreferrer"&gt;EvoLink Gemini 3.5 Flash API Guide&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to keep in mind
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;These are Google's benchmarks.&lt;/strong&gt; Independent community testing will confirm or adjust. Take the exact numbers with normal benchmark skepticism.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent cost is not unit cost.&lt;/strong&gt; A 20-step agent loop means 20 API calls. Fast and cheap per token is not the same as cheap per workflow run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Thinking adds reasoning tokens.&lt;/strong&gt; The model thinks before answering. This increases output quality but also output token count. Watch your bills.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3.5 Pro is expected next month.&lt;/strong&gt; If you need peak general reasoning from Google, it might be worth waiting.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.google/technology/google-deepmind/gemini-3-5-flash/" rel="noopener noreferrer"&gt;Google I/O 2026 Gemini 3.5 Flash&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://llm-stats.com/blog/research/gemini-3.5-flash-launch" rel="noopener noreferrer"&gt;Benchmarks and pricing analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.evolink.ai/en/api-manual/language-series/gemini-3.5-flash/native-api/native-api-quickstart" rel="noopener noreferrer"&gt;EvoLink API docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Pick models by task, not by tier. Flash for tool orchestration. Opus for code rewrites. GPT for screen control. That is the state of play.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;tags: gemini, ai-agents, llm, api, google-io&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>programming</category>
      <category>api</category>
    </item>
    <item>
      <title>Codex Now Works from Your Phone — Plus Hooks and CI/CD Tokens</title>
      <dc:creator>Evan-dong</dc:creator>
      <pubDate>Fri, 15 May 2026 07:57:10 +0000</pubDate>
      <link>https://forem.com/evan-dong/codex-now-works-from-your-phone-plus-hooks-and-cicd-tokens-4mj3</link>
      <guid>https://forem.com/evan-dong/codex-now-works-from-your-phone-plus-hooks-and-cicd-tokens-4mj3</guid>
      <description>&lt;h1&gt;
  
  
  Codex Now Works from Your Phone — Plus Hooks and CI/CD Tokens
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7i9v79rp9y36mscg2glx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7i9v79rp9y36mscg2glx.png" alt="Codex mobile in the ChatGPT app" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I run Codex for long refactoring tasks. The annoying pattern: start a job, step away, come back to find Codex has been waiting 40 minutes for me to approve a permission check.&lt;/p&gt;

&lt;p&gt;The May 2026 updates fix that, along with a few other gaps I cared about.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Things That Matter
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Mobile access (preview, May 14)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Codex is now available in preview on the ChatGPT mobile app for iOS and Android. Your phone connects to the Codex session running on your Mac — you review diffs, approve commands, switch models, and follow terminal output from wherever you are. The code stays on the host machine.&lt;/p&gt;

&lt;p&gt;Setup: latest ChatGPT app + Codex for Mac → scan QR code to pair.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Hooks (GA, May 14)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can now inject custom scripts at 6 points in the Codex loop. The one I found most useful: &lt;code&gt;PreToolUse&lt;/code&gt; to scan Bash commands for credential patterns before they execute.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.codex/config.toml&lt;/span&gt;
&lt;span class="nn"&gt;[[hooks.PreToolUse]]&lt;/span&gt;
&lt;span class="py"&gt;matcher&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^Bash$"&lt;/span&gt;

&lt;span class="nn"&gt;[[hooks.PreToolUse.hooks]]&lt;/span&gt;
&lt;span class="py"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"command"&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"python3 ~/.codex/hooks/scan_credentials.py"&lt;/span&gt;
&lt;span class="py"&gt;timeout&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The hook receives JSON on stdin with the command about to run, and can return &lt;code&gt;{"decision": "block", "reason": "..."}&lt;/code&gt; to stop it.&lt;/p&gt;

&lt;p&gt;Other hook events: &lt;code&gt;SessionStart&lt;/code&gt;, &lt;code&gt;PostToolUse&lt;/code&gt;, &lt;code&gt;PermissionRequest&lt;/code&gt;, &lt;code&gt;UserPromptSubmit&lt;/code&gt;, &lt;code&gt;Stop&lt;/code&gt;. Enterprise teams can push managed hooks via &lt;code&gt;requirements.toml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Caveat: hooks do not intercept all shell calls yet per the docs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Access tokens (Business/Enterprise, May 5)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you run Codex in CI/CD, you no longer need to fake an interactive login. Access tokens carry workspace identity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CODEX_ACCESS_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;token&amp;gt;"&lt;/span&gt;
codex &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;--json&lt;/span&gt; &lt;span class="s2"&gt;"run tests and report coverage"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Business and Enterprise only. Tokens expire (configurable) and show up in governance logs under the creating user.&lt;/p&gt;

&lt;h2&gt;
  
  
  Also Shipped
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Remote SSH (GA)&lt;/strong&gt;: connect Codex to managed dev environments through a relay — no public SSH port needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HIPAA compliance&lt;/strong&gt;: for eligible Enterprise workspaces in local environments&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Multi-Agent Problem Remains
&lt;/h2&gt;

&lt;p&gt;These features make Codex better. But if you also use Claude Code and Gemini CLI, you still have three API keys, three dashboards, no fallback between them.&lt;/p&gt;

&lt;p&gt;All three CLIs support custom endpoints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# Codex: ~/.codex/config.toml&lt;/span&gt;
&lt;span class="nn"&gt;[api]&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://api.evolink.ai/v1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Claude Code&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://api.evolink.ai"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Gemini CLI: ~/.gemini/.env&lt;/span&gt;
&lt;span class="nv"&gt;GOOGLE_GEMINI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://api.evolink.ai/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One gateway → unified cost, automatic fallback on 429/5xx, model switching via routing rules.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://evolink.ai/blog/one-endpoint-coding-clis?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=codex_mobile_hooks&amp;amp;utm_content=codex-mobile-devto" rel="noopener noreferrer"&gt;Setup guide for routing all three CLIs →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/changelog" rel="noopener noreferrer"&gt;Codex Changelog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/hooks" rel="noopener noreferrer"&gt;Hooks Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/enterprise/access-tokens" rel="noopener noreferrer"&gt;Access Tokens&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>programming</category>
      <category>api</category>
    </item>
    <item>
      <title>I Tried TencentDB Agent Memory — Here's What the Token Reduction Looks Like</title>
      <dc:creator>Evan-dong</dc:creator>
      <pubDate>Fri, 15 May 2026 07:47:26 +0000</pubDate>
      <link>https://forem.com/evan-dong/i-tried-tencentdb-agent-memory-heres-what-the-token-reduction-looks-like-2pm4</link>
      <guid>https://forem.com/evan-dong/i-tried-tencentdb-agent-memory-heres-what-the-token-reduction-looks-like-2pm4</guid>
      <description>&lt;h1&gt;
  
  
  I Tried TencentDB Agent Memory — Here's What the Token Reduction Looks Like
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7650bi3yd6scsa0uhm6d.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7650bi3yd6scsa0uhm6d.jpg" alt="TencentDB Agent Memory four-tier architecture" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The context window problem in long-running agents is familiar: by turn 20, you are paying for tool logs the agent does not need anymore. Truncation loses detail. Summarization compresses but also forgets.&lt;/p&gt;

&lt;p&gt;Tencent Cloud open-sourced TencentDB Agent Memory (MIT license, May 2026), and it takes a different approach: offload the verbose stuff to local files, keep a Mermaid task graph in context, let the agent drill back in when it needs specifics.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Four memory layers, each traceable back to raw data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;L0 Conversation&lt;/strong&gt;: raw dialogue + tool logs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L1 Atom&lt;/strong&gt;: structured facts extracted every N conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L2 Scenario&lt;/strong&gt;: aggregated solution patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L3 Persona&lt;/strong&gt;: user behavior profiles built over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The short-term trick: verbose tool output gets offloaded to &lt;code&gt;refs/*.md&lt;/code&gt; files. In context, only a lightweight Mermaid graph remains. When the agent needs a specific output, it retrieves by &lt;code&gt;node_id&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;According to the project's benchmarks (long-horizon sessions, not isolated turns):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Success Change&lt;/th&gt;
&lt;th&gt;Token Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WideSearch&lt;/td&gt;
&lt;td&gt;33% → 50%&lt;/td&gt;
&lt;td&gt;−61.38%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-bench&lt;/td&gt;
&lt;td&gt;58.4% → 64.2%&lt;/td&gt;
&lt;td&gt;−33.09%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PersonaMem&lt;/td&gt;
&lt;td&gt;48% → 76% accuracy&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Biggest gains on WideSearch — makes sense, that is where context accumulates fastest. SWE-bench improvement is real but modest (+9.93%).&lt;/p&gt;

&lt;p&gt;Important caveat: these are self-reported by the project team, not independently verified.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Setup (OpenClaw)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install&lt;/span&gt; @tencentdb-agent-memory/memory-tencentdb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;~/.openclaw/openclaw.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"memory-tencentdb"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"offload"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw gateway restart
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is it. SQLite + sqlite-vec by default, no external DB needed. The &lt;code&gt;offload.enabled: true&lt;/code&gt; is what activates the Mermaid compression — without it you only get long-term memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Layers of Cost Optimization
&lt;/h2&gt;

&lt;p&gt;Memory cuts tokens per call. But you are still paying the provider's per-token rate, and if the provider has an outage, the agent stalls.&lt;/p&gt;

&lt;p&gt;If you route agent LLM calls through a gateway, you get a second optimization layer: model routing (pick the cheapest capable model per task), automatic fallback on 429/5xx, and a unified cost dashboard.&lt;/p&gt;

&lt;p&gt;For Hermes, this means setting &lt;code&gt;MODEL_BASE_URL&lt;/code&gt; to a gateway endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; hermes-memory &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MODEL_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://api.evolink.ai/v1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MODEL_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-key &lt;span class="se"&gt;\&lt;/span&gt;
  hermes-memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fewer tokens × lower cost per token = compounding savings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://evolink.ai/blog/unified-api-multi-model-ai-apps?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=agent_memory_guide&amp;amp;utm_content=agent-memory-devto" rel="noopener noreferrer"&gt;More on unified API routing for multi-model apps →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Only OpenClaw and Hermes are supported today&lt;/li&gt;
&lt;li&gt;Offloading is off by default&lt;/li&gt;
&lt;li&gt;SQLite is single-agent; concurrent access needs Tencent Cloud Vector DB backend&lt;/li&gt;
&lt;li&gt;Benchmarks are project-reported&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Tencent/TencentDB-Agent-Memory" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Tencent/TencentDB-Agent-Memory/blob/main/README_CN.md" rel="noopener noreferrer"&gt;中文 README&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>api</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Codex vs Claude Code vs Cursor: What Changed in May 2026 and What Can Be Routed</title>
      <dc:creator>Evan-dong</dc:creator>
      <pubDate>Thu, 14 May 2026 12:50:13 +0000</pubDate>
      <link>https://forem.com/evan-dong/codex-vs-claude-code-vs-cursor-what-changed-in-may-2026-and-what-can-be-routed-2997</link>
      <guid>https://forem.com/evan-dong/codex-vs-claude-code-vs-cursor-what-changed-in-may-2026-and-what-can-be-routed-2997</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4yl3eu36rwu7etp6p6m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4yl3eu36rwu7etp6p6m.png" alt="Cursor cloud agent development environments" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A common setup now is Codex CLI for scaffolding, Claude Code for refactoring, and Cursor for IDE or cloud-agent workflows. This month all three shipped infra updates — on different dates, solving different problems.&lt;/p&gt;

&lt;p&gt;Here is what changed, what can be wired through one endpoint, and what has to remain separate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Updates
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Codex CLI — Windows Sandbox (May 13)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Codex now has real OS-level sandboxing on Windows. Dedicated user accounts (&lt;code&gt;CodexSandboxOffline&lt;/code&gt;/&lt;code&gt;CodexSandboxOnline&lt;/code&gt;), per-account firewall rules, helper binaries for privilege boundaries. Linux already had seccomp/bubblewrap; Windows finally caught up.&lt;/p&gt;

&lt;p&gt;Previously, Windows sandbox attempts used synthetic SIDs and proxy-based network blocking — programs could bypass them by implementing their own networking stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code — Doubled Five-Hour Rate Limits (May 6)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Anthropic doubled Claude Code's five-hour rate limits for Pro, Max, Team, and seat-based Enterprise plans. Also removed the peak hours limit reduction for Pro and Max.&lt;/p&gt;

&lt;p&gt;This was announced May 6, a week before the other updates. If you were splitting refactoring tasks to stay under the five-hour cap, you have roughly 2x headroom.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor — Cloud Agent Environments (May 13)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cloud environments with multi-repo support, Dockerfile config with build secrets, layer caching (70% faster on cache hits), version history, rollback, scoped egress/secrets, audit logging.&lt;/p&gt;

&lt;p&gt;Dockerfile auto-config is in private beta, rolling out to Enterprise teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Codex&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;th&gt;Cursor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Updated&lt;/td&gt;
&lt;td&gt;May 13&lt;/td&gt;
&lt;td&gt;May 6&lt;/td&gt;
&lt;td&gt;May 13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What&lt;/td&gt;
&lt;td&gt;Windows sandbox&lt;/td&gt;
&lt;td&gt;2x five-hour limits&lt;/td&gt;
&lt;td&gt;Cloud dev environments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runs&lt;/td&gt;
&lt;td&gt;Local sandbox&lt;/td&gt;
&lt;td&gt;Local terminal&lt;/td&gt;
&lt;td&gt;Cloud + IDE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-repo&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Isolation&lt;/td&gt;
&lt;td&gt;OS-level&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Cloud containers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Scaffolding, Windows&lt;/td&gt;
&lt;td&gt;Refactoring, terminal&lt;/td&gt;
&lt;td&gt;End-to-end delivery&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Problem When You Use More Than One
&lt;/h2&gt;

&lt;p&gt;Multiple provider keys, billing dashboards, and rate-limit surfaces. When Claude Code hits its five-hour cap, no automatic fallback happens by default. When you want to compare spend across routable CLIs, you are exporting data from separate consoles.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Gateway Setup
&lt;/h2&gt;

&lt;p&gt;Point the routable CLIs at one endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# Codex: ~/.codex/config.toml&lt;/span&gt;
&lt;span class="nn"&gt;[api]&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://api.evolink.ai/v1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Claude Code&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://api.evolink.ai"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-evolink-key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Gemini CLI: ~/.gemini/.env&lt;/span&gt;
&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-evolink-key
&lt;span class="nv"&gt;GOOGLE_GEMINI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://api.evolink.ai/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What you get: unified cost dashboard, automatic fallback on 429/5xx, model switching via routing rules.&lt;/p&gt;

&lt;p&gt;Note: Cursor cloud environments use their own backend — not routable through third-party gateways yet. Local Cursor IDE completions can be routed.&lt;/p&gt;

&lt;p&gt;Full walkthrough: &lt;a href="https://evolink.ai/blog/one-endpoint-coding-clis?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=coding_agents_infra&amp;amp;utm_content=coding-agents-devto" rel="noopener noreferrer"&gt;One Gateway for 3 Coding CLIs&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch Out For
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Config precedence:&lt;/strong&gt; env vars override config files. If you set &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt; globally and have a project &lt;code&gt;.claude/settings.json&lt;/code&gt;, behavior depends on tool version.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth errors:&lt;/strong&gt; 401 from gateway usually means old provider key is still being sent. Restart terminal after env var changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upstream limits still apply:&lt;/strong&gt; gateway adds fallback, but Anthropic five-hour cap and OpenAI daily quotas are still upstream.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://openai.com/index/building-codex-windows-sandbox" rel="noopener noreferrer"&gt;OpenAI: Codex Windows Sandbox&lt;/a&gt; (May 13)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.cursor.com/blog/cloud-agent-development-environments" rel="noopener noreferrer"&gt;Cursor: Cloud Agent Environments&lt;/a&gt; (May 13)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.anthropic.com/news/higher-limits-spacex" rel="noopener noreferrer"&gt;Anthropic: Higher Limits&lt;/a&gt; (May 6)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>programming</category>
      <category>api</category>
    </item>
    <item>
      <title>I Tried the Claude Code Skills Repo That Got 77K Stars — Here Is What Works and What Does Not</title>
      <dc:creator>Evan-dong</dc:creator>
      <pubDate>Wed, 13 May 2026 12:13:27 +0000</pubDate>
      <link>https://forem.com/evan-dong/i-tried-the-claude-code-skills-repo-that-got-77k-stars-here-is-what-works-and-what-does-not-57a4</link>
      <guid>https://forem.com/evan-dong/i-tried-the-claude-code-skills-repo-that-got-77k-stars-here-is-what-works-and-what-does-not-57a4</guid>
      <description>&lt;p&gt;I kept running into the same problem with coding agents: I would describe a task, the agent would build something, and it was not what I meant. Not broken — just off.&lt;/p&gt;

&lt;p&gt;The fix turned out to be surprisingly low-tech. Matt Pocock published a repo of "skills" — small instruction files that go in your &lt;code&gt;.claude&lt;/code&gt; directory and change how the agent approaches work. The repo exploded: 77,000+ stars, 6,700+ forks, #1 on GitHub Trending.&lt;/p&gt;

&lt;p&gt;I installed it. Here is what I found.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup (About 60 Seconds)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills@latest add mattpocock/skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pick the skills you want. Select &lt;code&gt;/setup-matt-pocock-skills&lt;/code&gt; — it is the bootstrap.&lt;/p&gt;

&lt;p&gt;Inside your agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/setup-matt-pocock-skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It asks your issue tracker (GitHub / Linear / local files), triage labels, and docs folder. Done.&lt;/p&gt;

&lt;p&gt;Works with Claude Code, Codex, Cursor, or anything that reads &lt;code&gt;.claude/&lt;/code&gt; directories.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 4 Skills I Actually Use
&lt;/h2&gt;

&lt;p&gt;There are 28 skills. I use 4 regularly.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/grill-with-docs&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the best one. Before you start coding, the agent asks you detailed questions about what you are building. Edge cases, constraints, why you are doing it this way.&lt;/p&gt;

&lt;p&gt;The output is a &lt;code&gt;CONTEXT.md&lt;/code&gt; file — a shared vocabulary that the agent reads in every future session. One of my projects had a 15-word phrase that got replaced with "materialization cascade." Every session after that was shorter and more accurate.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/tdd&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Forces test-driven development: tests first, implementation second, verification third. Works great on isolated functions. Gets annoying on complex UI where the tests are hard to specify upfront.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/diagnose&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Structured debugging instead of the agent guessing. Most useful when the error message does not point to the real problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;/caveman&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Extreme concision. The agent says as little as possible and just executes. Perfect for experienced devs who already know what they want.&lt;/p&gt;

&lt;h2&gt;
  
  
  Everything Else (Quick Map)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Alignment:&lt;/strong&gt; &lt;code&gt;/grill-me&lt;/code&gt; (non-code version), &lt;code&gt;/to-prd&lt;/code&gt;, &lt;code&gt;/to-issues&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Code quality:&lt;/strong&gt; &lt;code&gt;/prototype&lt;/code&gt;, &lt;code&gt;/triage&lt;/code&gt;, &lt;code&gt;/zoom-out&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; &lt;code&gt;/handoff&lt;/code&gt;, &lt;code&gt;/write-a-skill&lt;/code&gt;, &lt;code&gt;/review&lt;/code&gt; (WIP), &lt;code&gt;/setup-pre-commit&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Limits
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Skills are instructions, not plugins.&lt;/strong&gt; They do not make the model smarter — they make the conversation more structured.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;CONTEXT.md&lt;/code&gt; drifts.&lt;/strong&gt; You need to update it as the project evolves, or re-run the grill.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent-agnostic = lowest common denominator.&lt;/strong&gt; Skills work everywhere but cannot use agent-specific features like Claude Code's &lt;code&gt;/goal&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;28 skills is too many to learn at once.&lt;/strong&gt; Start with the four above.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  When It Gets Bigger
&lt;/h2&gt;

&lt;p&gt;Once a team runs multiple agents on the same codebase, the next problem is not prompt quality — it is routing. Who uses which model, how much does each workflow cost, and where do the API keys live.&lt;/p&gt;

&lt;p&gt;Skills handle the agent behavior side. For the model routing side, connecting Claude Code CLI to a gateway makes the access path concrete. The &lt;a href="https://docs.evolink.ai/en/integration-guide/claude-code-cli?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=claude_skills&amp;amp;utm_content=claude-skills-devto" rel="noopener noreferrer"&gt;EvoLink Claude Code CLI guide&lt;/a&gt; documents that setup if your team is at that stage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Should Use This
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;✅ Daily coding agent user who wants structured quality control&lt;/li&gt;
&lt;li&gt;✅ Team with domain-specific terminology that confuses agents&lt;/li&gt;
&lt;li&gt;✅ Multi-developer project using coding agents on the same repo&lt;/li&gt;
&lt;li&gt;❌ One-off scripts or throwaway code&lt;/li&gt;
&lt;li&gt;❌ Projects too small for shared language docs&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a href="https://github.com/mattpocock/skills" rel="noopener noreferrer"&gt;mattpocock/skills on GitHub&lt;/a&gt; | &lt;a href="https://skills.sh/mattpocock/skills" rel="noopener noreferrer"&gt;skills.sh installer&lt;/a&gt; | &lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;Claude Code docs&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;tags: claude-code, ai, developer-tools, coding-agent&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>programming</category>
      <category>api</category>
    </item>
    <item>
      <title>Stop Asking Claude Code for Markdown Specs. Ask for HTML Artifacts.</title>
      <dc:creator>Evan-dong</dc:creator>
      <pubDate>Sat, 09 May 2026 09:30:17 +0000</pubDate>
      <link>https://forem.com/evan-dong/stop-asking-claude-code-for-markdown-specs-ask-for-html-artifacts-16ke</link>
      <guid>https://forem.com/evan-dong/stop-asking-claude-code-for-markdown-specs-ask-for-html-artifacts-16ke</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FHHz_ftzaIAAwkQs%3Fformat%3Djpg%26name%3Dmedium" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FHHz_ftzaIAAwkQs%3Fformat%3Djpg%26name%3Dmedium" alt="Using Claude Code: The Unreasonable Effectiveness of HTML cover" width="1200" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Claude Code is very good at writing Markdown. That does not mean Markdown should be the default output for every task.&lt;/p&gt;

&lt;p&gt;Thariq from the Claude Code team recently described a workflow where he increasingly asks Claude Code for HTML instead of Markdown. The reason is practical: long Markdown specs are easy to generate but hard to read. HTML can turn the same information into a navigable, visual, and sometimes interactive artifact.&lt;/p&gt;

&lt;h2&gt;
  
  
  When HTML Beats Markdown
&lt;/h2&gt;

&lt;p&gt;Use HTML when the output is meant to be consumed by people, not maintained line by line in Git.&lt;/p&gt;

&lt;p&gt;Good fits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PR walkthroughs&lt;/li&gt;
&lt;li&gt;design option comparisons&lt;/li&gt;
&lt;li&gt;architecture explainers&lt;/li&gt;
&lt;li&gt;onboarding docs&lt;/li&gt;
&lt;li&gt;debugging reports&lt;/li&gt;
&lt;li&gt;one-off planning tools&lt;/li&gt;
&lt;li&gt;draggable prioritization boards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep Markdown for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;READMEs&lt;/li&gt;
&lt;li&gt;changelogs&lt;/li&gt;
&lt;li&gt;durable docs&lt;/li&gt;
&lt;li&gt;API references&lt;/li&gt;
&lt;li&gt;anything that needs clean Git diff review&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example: PR Review Artifact
&lt;/h2&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Summarize this PR in Markdown.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a single self-contained HTML PR walkthrough.
Render the important diff areas with inline annotations.
Color-code findings by severity.
Add a manual verification checklist at the bottom.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives reviewers something closer to a focused review interface than a wall of bullets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: Implementation Options
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Generate 5 implementation approaches as one HTML file.
Use a comparison grid.
For each approach show:
- complexity
- migration risk
- test impact
- recommended use case
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is much easier to scan than five Markdown sections stacked vertically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-Offs
&lt;/h2&gt;

&lt;p&gt;HTML is not always better.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;Markdown&lt;/th&gt;
&lt;th&gt;HTML&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Git diffs&lt;/td&gt;
&lt;td&gt;Great&lt;/td&gt;
&lt;td&gt;Noisy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long-term docs&lt;/td&gt;
&lt;td&gt;Great&lt;/td&gt;
&lt;td&gt;Mixed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visual hierarchy&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Interactivity&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Possible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sharing in browser&lt;/td&gt;
&lt;td&gt;Requires renderer&lt;/td&gt;
&lt;td&gt;Native&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The rule I use: Markdown is the source. HTML is the reading surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Prompt
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a self-contained HTML explainer for this feature.
Audience: an engineer who has not seen this code before.
Include a visual summary, annotated code snippets, risks, and a next-step checklist.
Do not add external dependencies.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The real insight is not "HTML is better than Markdown." It is that AI-generated output does not have to be plain text. If the model can generate a useful interface, ask for the interface.&lt;/p&gt;




&lt;p&gt;For teams building Claude Code workflows across multiple models, &lt;a href="https://evolink.ai?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=claude_html_output&amp;amp;utm_content=claude-code-html-over-markdown" rel="noopener noreferrer"&gt;EvoLink&lt;/a&gt; provides unified API access to Claude and other frontier models.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>image</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>OpenAI's New Realtime Voice Models Can Think, Translate, and Transcribe — Here's What Developers Need to Know</title>
      <dc:creator>Evan-dong</dc:creator>
      <pubDate>Fri, 08 May 2026 13:36:06 +0000</pubDate>
      <link>https://forem.com/evan-dong/openais-new-realtime-voice-models-can-think-translate-and-transcribe-heres-what-developers-5hab</link>
      <guid>https://forem.com/evan-dong/openais-new-realtime-voice-models-can-think-translate-and-transcribe-heres-what-developers-5hab</guid>
      <description>&lt;p&gt;OpenAI just shipped three realtime voice models through their API. One reasons at GPT-5 level during live calls. One translates 70+ languages in real time. One does streaming transcription. All available today.&lt;/p&gt;

&lt;p&gt;Let me break down what matters for developers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Models
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GPT-Realtime-2&lt;/strong&gt; handles voice conversations with GPT-5-level reasoning. The key difference from previous voice models: it can call tools mid-conversation without going silent. It narrates what it's doing while executing — OpenAI calls this "preamble."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-Realtime-Translate&lt;/strong&gt; does real-time voice translation. 70+ input languages, 13 output languages. End-to-end audio processing (no intermediate text step), which preserves tone and emotion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-Realtime-Whisper&lt;/strong&gt; is streaming speech-to-text. Words appear as the speaker talks. Built for live captions and meeting transcription.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integration Options
&lt;/h2&gt;

&lt;p&gt;All three use the Realtime API with three connection methods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;WebRTC&lt;/strong&gt; — browser-based, lowest latency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebSocket&lt;/strong&gt; — server-side, more control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SIP&lt;/strong&gt; — telephony integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  GPT-Realtime-2: Voice Agents That Actually Work
&lt;/h2&gt;

&lt;p&gt;If you've built voice agents before, you know the pain: tool calls create dead air. The user asks something that requires a database lookup, and the agent goes silent for 2-3 seconds. Feels broken.&lt;/p&gt;

&lt;p&gt;GPT-Realtime-2 solves this with preamble — it talks through its actions while executing them. "Let me check your calendar... I see you have a meeting with Alex Kim in 12 minutes." The tool call happens in parallel with the speech.&lt;/p&gt;

&lt;p&gt;Other developer-relevant specs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;128K context window (up from 32K)&lt;/li&gt;
&lt;li&gt;Handles interruptions without losing context&lt;/li&gt;
&lt;li&gt;Better instruction following for system prompts&lt;/li&gt;
&lt;li&gt;Text tokens: $4/$16 per million (input/output)&lt;/li&gt;
&lt;li&gt;Audio tokens: $32/$64 per million&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  GPT-Realtime-Translate: The $0.034/min Disruption
&lt;/h2&gt;

&lt;p&gt;The translation model is priced at $0.034 per minute. For context, a human simultaneous interpreter costs $25-44 per minute.&lt;/p&gt;

&lt;p&gt;Technical details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Processes raw audio end-to-end (not cascaded speech-to-text-to-speech)&lt;/li&gt;
&lt;li&gt;Preserves speaker emotion and tone&lt;/li&gt;
&lt;li&gt;Works best with brief pauses between thoughts (labeled "turn-based" in docs)&lt;/li&gt;
&lt;li&gt;Occasional hallucinations still occur&lt;/li&gt;
&lt;li&gt;Supports language switching mid-stream&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The end-to-end approach is what makes the quality difference. Traditional pipelines lose vocal characteristics at every stage. This model skips text entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  GPT-Realtime-Whisper: Streaming Transcription
&lt;/h2&gt;

&lt;p&gt;If you need real-time captions or meeting transcription, this is the model. Low-latency streaming output as the speaker talks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Can Build
&lt;/h2&gt;

&lt;p&gt;The three models together cover the full voice infrastructure stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Customer support agents&lt;/strong&gt; that can reason, look up accounts, and process requests — all by voice&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time translation layers&lt;/strong&gt; for international meetings at 1/1000th the cost of human interpreters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live captioning systems&lt;/strong&gt; for streaming, conferences, or accessibility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multilingual voice assistants&lt;/strong&gt; that handle code-switching naturally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Telephony bots&lt;/strong&gt; via SIP integration that feel like talking to a person&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/" rel="noopener noreferrer"&gt;OpenAI Blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/api/docs/guides/realtime" rel="noopener noreferrer"&gt;Realtime API Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/api/docs/models/gpt-realtime" rel="noopener noreferrer"&gt;Model Reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/cookbook/examples/voice_solutions/one_way_translation_using_realtime_api" rel="noopener noreferrer"&gt;Translation Cookbook&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>image</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Anthropic's Agents Now Self-Improve Between Sessions. Here's How Dreaming Works.</title>
      <dc:creator>Evan-dong</dc:creator>
      <pubDate>Thu, 07 May 2026 12:12:59 +0000</pubDate>
      <link>https://forem.com/evan-dong/anthropics-agents-now-self-improve-between-sessions-heres-how-dreaming-works-48l8</link>
      <guid>https://forem.com/evan-dong/anthropics-agents-now-self-improve-between-sessions-heres-how-dreaming-works-48l8</guid>
      <description>&lt;p&gt;On May 6th, Anthropic shipped three new capabilities for Managed Agents. Two of them — Outcomes and multi-agent orchestration — are solid infrastructure upgrades. The third one, Dreaming, is the one worth stopping to think about.&lt;/p&gt;

&lt;p&gt;Dreaming is a scheduled background process that runs between sessions. The agent reviews its own past conversation transcripts, identifies recurring patterns, and writes learnings into its memory stores. No human prompt required. No explicit instruction to "remember this."&lt;/p&gt;

&lt;p&gt;If you've been building with Claude agents, you already know how memory works: you tell the agent something, it stores it, it uses it next time. Passive. Explicit. You're the one deciding what gets remembered.&lt;/p&gt;

&lt;p&gt;Dreaming flips that. The agent decides.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Actually Works
&lt;/h2&gt;

&lt;p&gt;The process runs on a schedule between sessions. The agent scans past transcripts looking for signal: mistakes it repeated, approaches that worked, edge cases it missed. It then curates its own memory stores based on what it finds. The original session data stays untouched — Dreaming writes to memory, not back to history.&lt;/p&gt;

&lt;p&gt;There are two autonomy modes you can configure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automatic&lt;/strong&gt;: the agent identifies patterns and writes them to memory directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human review&lt;/strong&gt;: the agent proposes memory updates, you approve before they take effect&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The human review mode is the safer starting point for production systems. You get the cross-session pattern recognition without giving the agent unilateral write access to its own memory.&lt;/p&gt;

&lt;p&gt;Currently in research preview — not GA yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters: The Cross-Session Blind Spot
&lt;/h2&gt;

&lt;p&gt;Here's the problem Dreaming solves. Individual sessions can't see cross-session patterns. A support agent that misclassifies a certain type of ticket won't notice it's made the same error 12 times this month. Each session starts fresh. The pattern is invisible.&lt;/p&gt;

&lt;p&gt;Dreaming surfaces exactly that kind of signal. It's the difference between an agent that resets every session and one that accumulates operational experience over time.&lt;/p&gt;

&lt;p&gt;The practical implication: an agent that's been running for three months has three months of self-curated experience. A freshly deployed agent starts from zero. Over time, these become fundamentally different systems — not because of different prompts, but because of different histories.&lt;/p&gt;

&lt;h2&gt;
  
  
  Outcomes: The Signal Dreaming Needs
&lt;/h2&gt;

&lt;p&gt;Dreaming needs to know what "doing well" means. That's what Outcomes provides.&lt;/p&gt;

&lt;p&gt;You define a success rubric. A separate Claude instance — isolated from the agent's reasoning, running in its own context window — evaluates output against your criteria. If it fails, the grader identifies what needs to change, and the agent iterates until it meets the bar.&lt;/p&gt;

&lt;p&gt;Numbers from Anthropic's internal testing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Task success rates improved up to &lt;strong&gt;10 percentage points&lt;/strong&gt; over standard prompting&lt;/li&gt;
&lt;li&gt;Structured file generation: &lt;strong&gt;+8.4%&lt;/strong&gt; on .docx, &lt;strong&gt;+10.1%&lt;/strong&gt; on .pptx&lt;/li&gt;
&lt;li&gt;Works for subjective quality — editorial voice, writing style, brand consistency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The isolation model matters here. The grader runs in a separate context window, which means it can't be influenced by the agent's own reasoning. It's evaluating output, not process.&lt;/p&gt;

&lt;p&gt;Connect the two: Outcomes identifies failures. Dreaming remembers them. One is the exam. The other is the error notebook.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Agent Orchestration: Now in Public Beta
&lt;/h2&gt;

&lt;p&gt;The third piece moved from preview to public beta. A coordinator agent decomposes tasks and delegates to up to 20 specialist subagents running in parallel. Each subagent gets its own context window. They share a common filesystem.&lt;/p&gt;

&lt;p&gt;Key details for builders:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full trace visibility in Claude Console&lt;/li&gt;
&lt;li&gt;Coordinator can send follow-up messages mid-workflow&lt;/li&gt;
&lt;li&gt;Subagents retain context between exchanges&lt;/li&gt;
&lt;li&gt;Orchestration depth limited to one level — no sub-sub-agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The depth limit is worth noting. If your architecture needs nested orchestration, this isn't the right fit yet.&lt;/p&gt;

&lt;p&gt;Real-world results from early adopters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Harvey (legal AI): task completion rates up approximately &lt;strong&gt;6x&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Wisedocs (document verification): review speed improved &lt;strong&gt;50%&lt;/strong&gt; while maintaining quality&lt;/li&gt;
&lt;li&gt;Netflix: parallel batch analysis across hundreds of build logs&lt;/li&gt;
&lt;li&gt;Spiral by Every: Haiku coordinator + Opus writing subagents + Outcomes grader scoring against editorial principles&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Webhooks and Pricing
&lt;/h2&gt;

&lt;p&gt;Webhooks are in public beta. Agents push notifications to your system when tasks complete. For long-running jobs — some sessions run for hours — this is essential. You don't want to poll.&lt;/p&gt;

&lt;p&gt;Pricing: standard Claude API token rates plus &lt;strong&gt;$0.08 per active session hour&lt;/strong&gt;. Idle time is free. A 30-minute task costs 4 cents in infrastructure fees on top of tokens. Dreaming, Outcomes, and Webhooks don't add separate charges.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dreaming&lt;/td&gt;
&lt;td&gt;Research preview&lt;/td&gt;
&lt;td&gt;Agents review past sessions, extract patterns, curate memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Outcomes&lt;/td&gt;
&lt;td&gt;Public beta&lt;/td&gt;
&lt;td&gt;Automated output grading against developer-defined rubrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-agent orchestration&lt;/td&gt;
&lt;td&gt;Public beta&lt;/td&gt;
&lt;td&gt;Coordinator + up to 20 parallel subagents, shared filesystem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Webhooks&lt;/td&gt;
&lt;td&gt;Public beta&lt;/td&gt;
&lt;td&gt;Push notifications when agent tasks complete&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing&lt;/td&gt;
&lt;td&gt;Live&lt;/td&gt;
&lt;td&gt;$0.08/active session hour + standard token costs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  One Limitation Worth Knowing
&lt;/h2&gt;

&lt;p&gt;Managed Agents runs Claude models exclusively. The orchestration, Dreaming, Outcomes grading — all Claude. If your architecture needs to route between models (cost optimization, specialized capabilities, latency requirements), that's a layer Managed Agents doesn't address.&lt;/p&gt;

&lt;p&gt;If you're building multi-model agent systems that need persistent context across providers, &lt;a href="https://docs.evolink.ai/en/integration-guide/claude-desktop" rel="noopener noreferrer"&gt;EvoLink&lt;/a&gt; provides a unified gateway routing across Claude, DeepSeek, GPT, and others from a single API endpoint.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Author: Jessie, COO at EvoLink&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://claude.com/blog/new-in-claude-managed-agents" rel="noopener noreferrer"&gt;Anthropic: New in Claude Managed Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/engineering/managed-agents" rel="noopener noreferrer"&gt;Anthropic Engineering: Decoupling the Brain from the Hands&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>image</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How I Stopped Burning Through My Claude Code Quota by Noon</title>
      <dc:creator>Evan-dong</dc:creator>
      <pubDate>Wed, 06 May 2026 09:58:47 +0000</pubDate>
      <link>https://forem.com/evan-dong/how-i-stopped-burning-through-my-claude-code-quota-by-noon-1fp6</link>
      <guid>https://forem.com/evan-dong/how-i-stopped-burning-through-my-claude-code-quota-by-noon-1fp6</guid>
      <description>&lt;p&gt;&lt;em&gt;By Jessie, COO at EvoLink&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You open Claude Code at 9am. By noon, you're rate-limited. Your colleague does twice the work and still has quota left at 5pm. Same Max subscription. What's going on?&lt;/p&gt;

&lt;p&gt;I ran into this exact situation and went digging. Turns out Anthropic published an internal engineering blog — "Lessons from building Claude Code: Prompt Caching is Everything" — that explains the whole thing. The short version: your daily habits are probably destroying your cache hit rate, and that's costing you 10-20x more tokens per message than necessary.&lt;/p&gt;

&lt;p&gt;Here's what I learned and what I changed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Mechanic: Prefix Caching
&lt;/h2&gt;

&lt;p&gt;Every request Claude Code sends to the model follows this structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;System prompt + Tool definitions → Project docs (CLAUDE.md) → Session context → Messages
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API caches this sequence from the front. On the next request, if the prefix matches what was cached before, it reuses the prior computation. A cache hit costs &lt;strong&gt;one-tenth&lt;/strong&gt; of normal price for those tokens.&lt;/p&gt;

&lt;p&gt;But if any single byte in the prefix changes, everything from that point onward is invalidated. Full price recalculation.&lt;/p&gt;

&lt;p&gt;The ordering is intentional. Anthropic's design principle: the less something changes, the earlier it goes. System prompt and tool definitions rarely change — they sit at the front. CLAUDE.md changes occasionally — middle. Messages change every turn — last. Each new turn just appends to the end. Everything before it stays cached.&lt;/p&gt;




&lt;h2&gt;
  
  
  Four Things That Kill Your Cache
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Switching Models Mid-Conversation
&lt;/h3&gt;

&lt;p&gt;This one hurts the most. You're mid-session with Opus, a simple task comes up, you run &lt;code&gt;/model&lt;/code&gt; to switch to Haiku, handle it, switch back.&lt;/p&gt;

&lt;p&gt;Cache is bound to the model. One switch = all accumulated cache invalidated, rebuilt from scratch. The rebuild cost often exceeds what letting Opus answer the simple question would have cost.&lt;/p&gt;

&lt;p&gt;Anthropic's internal approach: keep one model for the main conversation. When a smaller model is needed, use a sub-agent — independent context and cache, does its work, passes the result back without touching the main session's cache chain.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Changing Tool Configuration Mid-Session
&lt;/h3&gt;

&lt;p&gt;Adding an MCP tool, removing one, or updating parameters — tool definitions are part of the cached prefix. Any change breaks the chain.&lt;/p&gt;

&lt;p&gt;This is why Claude Code keeps tool definitions in place even when unused. The cost of extra definition tokens is negligible compared to a full cache invalidation.&lt;/p&gt;

&lt;p&gt;Plan Mode follows the same logic: instead of removing execution tools when entering planning mode, it adds &lt;code&gt;EnterPlanMode&lt;/code&gt;/&lt;code&gt;ExitPlanMode&lt;/code&gt; as special tools. The tool set never changes. The cache stays valid.&lt;/p&gt;

&lt;p&gt;For users with many MCP tools, Claude Code uses lazy loading: start with lightweight stubs (tool name + one-line description), pull full schemas only when the model actually needs to call a tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Opening New Sessions Constantly
&lt;/h3&gt;

&lt;p&gt;Every fresh &lt;code&gt;claude&lt;/code&gt; invocation starts cache from zero. If your habit is "ask two questions, quit, reopen" — you never accumulate cache benefit.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Switching Between Accounts
&lt;/h3&gt;

&lt;p&gt;Cache is isolated per account. Rotating through account pools resets the cache each time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to Do Instead
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Keep conversations long.&lt;/strong&gt; Longer conversation = thicker cache = cheaper messages toward the end. Stop opening new sessions unnecessarily.&lt;/p&gt;

&lt;p&gt;You might worry about context window overflow. Don't. Claude Code has built-in compaction — automatic history compression when context gets too long. Anthropic designed Cache-Safe Forking: the compaction request reuses the exact same system prompt and tool definitions, sharing the same cache chain. The only new cost is the compression instruction itself.&lt;/p&gt;

&lt;p&gt;Long conversations don't get more expensive. They get cheaper.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't switch models mid-conversation.&lt;/strong&gt; If you need a different model, open a separate conversation for that task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configure MCP tools before the session starts.&lt;/strong&gt; Don't add or remove mid-session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use &lt;code&gt;--resume&lt;/code&gt; to continue previous sessions.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude &lt;span class="nt"&gt;--resume&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This restores your last session. The cache chain picks up where it left off. No rebuild. This single flag is probably the most underrated cost-saving habit in Claude Code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Cache Impact&lt;/th&gt;
&lt;th&gt;Cost Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Switch model mid-conversation&lt;/td&gt;
&lt;td&gt;Full invalidation&lt;/td&gt;
&lt;td&gt;Up to 20x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add/remove MCP tools&lt;/td&gt;
&lt;td&gt;Full invalidation&lt;/td&gt;
&lt;td&gt;10-20x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open new session&lt;/td&gt;
&lt;td&gt;Start from zero&lt;/td&gt;
&lt;td&gt;First turns at full price&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Switch accounts&lt;/td&gt;
&lt;td&gt;Full invalidation&lt;/td&gt;
&lt;td&gt;10-20x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long continuous conversation&lt;/td&gt;
&lt;td&gt;Accumulates&lt;/td&gt;
&lt;td&gt;Gets cheaper over time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use &lt;code&gt;--resume&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Continues chain&lt;/td&gt;
&lt;td&gt;Near-free&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  One More Detail Worth Knowing
&lt;/h2&gt;

&lt;p&gt;Claude Code never modifies the system prompt to update state information (current time, file changes). Instead, it injects updates using &lt;code&gt;&amp;lt;system-reminder&amp;gt;&lt;/code&gt; tags inside messages. Because modifying the prompt would break the cache. The prompt is treated as immutable infrastructure. Messages are the fluid information layer.&lt;/p&gt;

&lt;p&gt;That's the level of obsession Anthropic has about this. They monitor cache hit rate with the same severity as server uptime. A drop is treated as an incident.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Model-Switching Problem
&lt;/h2&gt;

&lt;p&gt;"Never switch models" is painful advice in practice. Sonnet for everyday coding, Opus for architecture decisions, Haiku for quick questions — that's a normal workflow.&lt;/p&gt;

&lt;p&gt;Anthropic's answer is "use sub-agents," but most users can't orchestrate sub-agents themselves. If you're running Claude Code through a gateway like &lt;a href="https://docs.evolink.ai/en/integration-guide/claude-desktop" rel="noopener noreferrer"&gt;EvoLink&lt;/a&gt;, model routing can happen at the infrastructure level without breaking your session's cache chain. Worth knowing that option exists.&lt;/p&gt;




&lt;p&gt;Caching is not an optimization technique. It is the foundation of the entire system. Now you know what Anthropic knows.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Sources:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/engineering/claude-code-prompt-caching" rel="noopener noreferrer"&gt;Anthropic Engineering: Lessons from building Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;Claude Code Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Jessie is COO at EvoLink, a Claude API gateway for teams and developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>image</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Codex v0.128.0: /goal Keeps Working Until It's Done -- Even Across Sessions</title>
      <dc:creator>Evan-dong</dc:creator>
      <pubDate>Sun, 03 May 2026 08:58:24 +0000</pubDate>
      <link>https://forem.com/evan-dong/codex-v01280-goal-keeps-working-until-its-done-even-across-sessions-5d85</link>
      <guid>https://forem.com/evan-dong/codex-v01280-goal-keeps-working-until-its-done-even-across-sessions-5d85</guid>
      <description>&lt;p&gt;Every AI coding assistant forgets what it was doing the moment you close the terminal. Codex just fixed that.&lt;/p&gt;

&lt;p&gt;OpenAI shipped v0.128.0 on April 30th with two features that matter more than they sound: &lt;code&gt;/goal&lt;/code&gt; for persistent cross-session objectives, and &lt;code&gt;/pet&lt;/code&gt; for ambient agent status feedback.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Session Amnesia Problem
&lt;/h2&gt;

&lt;p&gt;You ask your AI assistant to refactor a module. It gets halfway through. You close the terminal, grab coffee, come back -- and it has zero memory of what it was doing.&lt;/p&gt;

&lt;p&gt;You re-explain the task. It starts over. You lose 15 minutes of context every single time.&lt;/p&gt;

&lt;p&gt;This is the &lt;strong&gt;intent persistence&lt;/strong&gt; problem. Not context window size -- the model simply forgets your &lt;em&gt;objective&lt;/em&gt; when the session ends.&lt;/p&gt;

&lt;h2&gt;
  
  
  /goal: Define It Once, Codex Keeps Going
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;/goal&lt;/code&gt; lets you set a persistent objective that survives across sessions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/goal create "Increase test coverage in src/auth/ from 62% to 90%"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Close the terminal. Reboot. Come back tomorrow. The goal is still there.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/goal create&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Define a persistent objective&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/goal pause&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Suspend the goal, preserve progress&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/goal resume&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pick up where you left off&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/goal clear&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Mark done or abandon&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Under the hood, goal state is managed through app-server APIs with runtime continuation. When you &lt;code&gt;/goal resume&lt;/code&gt;, Codex restores the execution context -- not just the goal text.&lt;/p&gt;

&lt;p&gt;This shifts AI coding from &lt;strong&gt;request-response&lt;/strong&gt; to &lt;strong&gt;goal-driven agent&lt;/strong&gt;: you define the destination, the tool figures out how to get there across as many sessions as it takes.&lt;/p&gt;

&lt;h2&gt;
  
  
  /pet: Agent Observability, But Cute
&lt;/h2&gt;

&lt;p&gt;Type &lt;code&gt;/pet&lt;/code&gt; and a small animated creature appears in your Codex interface. It reflects what Codex is doing in the background:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Running a task? The pet is active.&lt;/li&gt;
&lt;li&gt;Tests passed? It celebrates.&lt;/li&gt;
&lt;li&gt;Something stuck? It reacts.&lt;/li&gt;
&lt;li&gt;Idle? It sleeps.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;9to5Mac called them "little Dynamic Island-ish messengers." Sam Altman said: "This isn't the most important thing we've done, but it's more useful than it looks."&lt;/p&gt;

&lt;p&gt;You can also &lt;code&gt;/hatch&lt;/code&gt; a custom pet -- Codex generates one based on your project context.&lt;/p&gt;

&lt;p&gt;Silly? Sure. But &lt;strong&gt;agent observability&lt;/strong&gt; during long-running tasks is a real problem, and this solves it without requiring you to tail logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Signals
&lt;/h2&gt;

&lt;p&gt;When Cursor, Claude Code, and Codex generate roughly similar code, what differentiates them?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Old&lt;/th&gt;
&lt;th&gt;New&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Task scope&lt;/td&gt;
&lt;td&gt;Single-turn&lt;/td&gt;
&lt;td&gt;Multi-session goal tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent visibility&lt;/td&gt;
&lt;td&gt;Terminal output&lt;/td&gt;
&lt;td&gt;Ambient status indicators&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session model&lt;/td&gt;
&lt;td&gt;Stateless&lt;/td&gt;
&lt;td&gt;Stateful across restarts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Once core functionality reaches parity, &lt;strong&gt;experience becomes the differentiator&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  v0.128.0 Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Virtual pet&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/pet&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Animated agent status companion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom pet&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/hatch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;AI-generated project-specific pet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Goal system&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/goal&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Persistent cross-session objectives&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-update&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex update&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Update from terminal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Side chat&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/side&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Parallel conversation panel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plugin marketplace&lt;/td&gt;
&lt;td&gt;marketplace&lt;/td&gt;
&lt;td&gt;One-click plugin install&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Practical Notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;/goal&lt;/code&gt; for multi-day refactors, coverage targets, migration checklists. Not for one-off fixes.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;/pet&lt;/code&gt; as ambient monitoring during long agent runs.&lt;/li&gt;
&lt;li&gt;If you are juggling multiple AI tools (Codex, Claude Code, Gemini), the fragmentation tax is real. &lt;a href="https://evolink.ai?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=codex-pet&amp;amp;utm_content=codex_pet" rel="noopener noreferrer"&gt;EvoLink&lt;/a&gt; unifies 30+ models behind one API gateway with smart routing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/openai/codex/releases" rel="noopener noreferrer"&gt;Codex v0.128.0 Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://9to5mac.com/2026/05/01/i-think-i-just-vibe-coded-lil-finder-guy-onto-my-mac/" rel="noopener noreferrer"&gt;9to5Mac: Vibe Coding Lil Finder Guy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://testingcatalog.com/openai-updates-codex-and-prepares-remote-control-feature/" rel="noopener noreferrer"&gt;TestingCatalog: Codex Update&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>image</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Claude Opus 4.7: What Actually Changed and Whether You Should Migrate</title>
      <dc:creator>Evan-dong</dc:creator>
      <pubDate>Thu, 30 Apr 2026 10:01:52 +0000</pubDate>
      <link>https://forem.com/evan-dong/claude-opus-47-what-actually-changed-and-whether-you-should-migrate-27e6</link>
      <guid>https://forem.com/evan-dong/claude-opus-47-what-actually-changed-and-whether-you-should-migrate-27e6</guid>
      <description>&lt;p&gt;If you follow AI model releases, you have already seen the headlines about Claude Opus 4.7. Most of them focus on benchmark numbers.&lt;/p&gt;

&lt;p&gt;This article focuses on something more useful: what changed in practice, what breaks during migration, and which workflows benefit most.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Short Version
&lt;/h2&gt;

&lt;p&gt;Claude Opus 4.7 is Anthropic's strongest generally available model for agentic coding and structured enterprise work as of April 2026. It is not a universal upgrade. It introduces breaking API changes that require testing before migration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Opus 4.7 Is Strongest
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Agentic Coding
&lt;/h3&gt;

&lt;p&gt;This is the headline improvement. Anthropic describes Opus 4.7 as a notable step up over Opus 4.6 for multi-step software engineering tasks. The difference shows most on work that requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reading a codebase across multiple files&lt;/li&gt;
&lt;li&gt;forming a plan and using tools&lt;/li&gt;
&lt;li&gt;verifying outputs before finalizing&lt;/li&gt;
&lt;li&gt;revising when initial attempts fail&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your LLM usage is mostly one-shot snippets or ad hoc brainstorming, the upgrade matters less.&lt;/p&gt;

&lt;h3&gt;
  
  
  High-Resolution Vision
&lt;/h3&gt;

&lt;p&gt;Opus 4.7 raises the image ceiling from 1568px / 1.15MP to 2576px / 3.75MP with simpler 1:1 coordinate mapping. This matters for screenshot QA, UI bug investigation, dense chart interpretation, and document understanding workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task Budgets
&lt;/h3&gt;

&lt;p&gt;A new &lt;code&gt;task_budget&lt;/code&gt; parameter (beta) lets you give Claude an approximate token budget for the full agentic loop, including thinking, tool calls, and output. The model can prioritize work and wind down gracefully instead of hitting a wall mid-task.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extended Thinking Control
&lt;/h3&gt;

&lt;p&gt;A new &lt;code&gt;xhigh&lt;/code&gt; effort level sits between &lt;code&gt;high&lt;/code&gt; and &lt;code&gt;max&lt;/code&gt;, giving finer control over reasoning depth.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Breaks During Migration
&lt;/h2&gt;

&lt;p&gt;This is the part most review posts underplay.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sampling parameters removed.&lt;/strong&gt; Setting &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;top_p&lt;/code&gt;, or &lt;code&gt;top_k&lt;/code&gt; to any non-default value returns a 400 error. If your production code depends on those controls, this is a migration task, not a footnote.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extended thinking budgets removed.&lt;/strong&gt; Adaptive thinking is now the supported path, disabled by default unless you opt in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thinking output hidden by default.&lt;/strong&gt; Thinking content is omitted unless you explicitly choose a display mode like &lt;code&gt;summarized&lt;/code&gt;. Apps that surface reasoning traces will see UX changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tokenizer changed.&lt;/strong&gt; The new tokenizer can use 1x to 1.35x more tokens depending on content. Old &lt;code&gt;max_tokens&lt;/code&gt; assumptions and compacting logic may behave differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;$15 / 1M tokens&lt;/td&gt;
&lt;td&gt;$75 / 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt caching write&lt;/td&gt;
&lt;td&gt;$18.75 / 1M tokens&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt caching read&lt;/td&gt;
&lt;td&gt;$1.50 / 1M tokens&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch API&lt;/td&gt;
&lt;td&gt;$7.50 / 1M tokens&lt;/td&gt;
&lt;td&gt;$37.50 / 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The headline price is simple. The real cost story is not. Because the tokenizer changed, two teams can quote the same pricing and end up with different effective costs. Replay real prompts and measure before committing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Should Upgrade
&lt;/h2&gt;

&lt;p&gt;Opus 4.7 is a strong fit if you are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;building coding agents that inspect, plan, and verify across files&lt;/li&gt;
&lt;li&gt;running enterprise workflows with documents, charts, or screenshots&lt;/li&gt;
&lt;li&gt;building long-horizon agents where follow-through matters&lt;/li&gt;
&lt;li&gt;willing to tune effort, caching, and token budgets&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Who Should Test First
&lt;/h2&gt;

&lt;p&gt;Slow down if you are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sensitive to token cost variance&lt;/li&gt;
&lt;li&gt;dependent on sampling parameter controls&lt;/li&gt;
&lt;li&gt;building experiences where conversational style matters more than execution&lt;/li&gt;
&lt;li&gt;expecting a drop-in swap from Opus 4.6&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Access
&lt;/h2&gt;

&lt;p&gt;Available through Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and Claude consumer plans (Pro, Max, Team, Enterprise). Also rolling out in GitHub Copilot.&lt;/p&gt;

&lt;p&gt;For teams evaluating multiple models in production, a unified API gateway like &lt;a href="https://evolink.ai?utm_source=devto&amp;amp;utm_medium=community&amp;amp;utm_campaign=opus47&amp;amp;utm_content=opus47-review" rel="noopener noreferrer"&gt;EvoLink&lt;/a&gt; simplifies routing and billing across providers without vendor lock-in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;Claude Opus 4.7 is one of the best generally available choices for agentic coding in April 2026. Adopt it as a measured workflow decision, not as a blanket default. Test your migration path before switching production traffic.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Based on Anthropic's official launch materials and API documentation published April 16, 2026.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>image</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
