<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: KingGyu</title>
    <description>The latest articles on Forem by KingGyu (@kinggyusuh).</description>
    <link>https://forem.com/kinggyusuh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3889036%2Fdad3362d-903c-4883-84fc-57d05a13abe9.png</url>
      <title>Forem: KingGyu</title>
      <link>https://forem.com/kinggyusuh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/kinggyusuh"/>
    <language>en</language>
    <item>
      <title>I open-sourced Codex Spark: traceable UI delegation for Codex</title>
      <dc:creator>KingGyu</dc:creator>
      <pubDate>Wed, 06 May 2026 16:49:29 +0000</pubDate>
      <link>https://forem.com/kinggyusuh/i-open-sourced-codex-spark-traceable-ui-delegation-for-codex-1l6g</link>
      <guid>https://forem.com/kinggyusuh/i-open-sourced-codex-spark-traceable-ui-delegation-for-codex-1l6g</guid>
      <description>&lt;p&gt;I open-sourced &lt;strong&gt;Codex Spark&lt;/strong&gt;, a Codex plugin for delegating concrete Computer Use and Browser Use tasks to GPT-5.3 Codex Spark subagents.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/KingGyuSuh/awesome-codex-spark" rel="noopener noreferrer"&gt;https://github.com/KingGyuSuh/awesome-codex-spark&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;As Codex sessions get better at long-horizon reasoning, the bottleneck is not always "can the model click the button?"&lt;/p&gt;

&lt;p&gt;Often the better question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Should the most reasoning-heavy model spend its context and tokens on mechanical UI work?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the parent session is doing architecture, code review, release verification, or product reasoning, I want it focused there. A visible-world task like opening a page, reading UI state, pasting approved content, or filling one approved form should be delegated as bounded execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;Codex Spark uses one skill: &lt;code&gt;$codex-spark-delegate&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The parent session remains responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;understanding the user request;&lt;/li&gt;
&lt;li&gt;choosing exactly one surface: Computer Use or Browser Use;&lt;/li&gt;
&lt;li&gt;confirming exact side effects;&lt;/li&gt;
&lt;li&gt;setting the target, content, limits, and verification criteria;&lt;/li&gt;
&lt;li&gt;reading the returned trace and deciding recovery.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Spark child is not a planner. It is an executor.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trace Is The Interface
&lt;/h2&gt;

&lt;p&gt;The child must return:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;status;&lt;/li&gt;
&lt;li&gt;trace id;&lt;/li&gt;
&lt;li&gt;tool surface;&lt;/li&gt;
&lt;li&gt;target;&lt;/li&gt;
&lt;li&gt;model config;&lt;/li&gt;
&lt;li&gt;steps;&lt;/li&gt;
&lt;li&gt;observations;&lt;/li&gt;
&lt;li&gt;verification;&lt;/li&gt;
&lt;li&gt;artifacts;&lt;/li&gt;
&lt;li&gt;blockers;&lt;/li&gt;
&lt;li&gt;next step.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That trace matters because UI work fails in partial ways. A form might submit but not visibly persist. A rich-text editor might accept pasted text but corrupt non-ASCII characters. A browser tool might be unavailable. The parent needs evidence, not a vague "done."&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Does Not Do
&lt;/h2&gt;

&lt;p&gt;Codex Spark is intentionally narrow.&lt;/p&gt;

&lt;p&gt;It does not ship X, Reddit, Gmail, or other domain-specific executors. Those belong in separate plugins. It also does not silently replace Browser Use with HTTP scraping or another automation surface. If the requested surface is unavailable, the child reports &lt;code&gt;blocked&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Built It
&lt;/h2&gt;

&lt;p&gt;The useful split is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reasoning-heavy parent model, for example GPT-5.5 xhigh, handles judgment;&lt;/li&gt;
&lt;li&gt;Codex Spark handles bounded visible-world execution;&lt;/li&gt;
&lt;li&gt;the trace is the join point.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That lets the strongest reasoning stay focused on design, code, and verification while Spark handles the mechanical UI/browser work.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/KingGyuSuh/awesome-codex-spark" rel="noopener noreferrer"&gt;https://github.com/KingGyuSuh/awesome-codex-spark&lt;/a&gt;&lt;/p&gt;

</description>
      <category>openai</category>
      <category>opensource</category>
      <category>ai</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Autarch: AI Strategy Evolution, Deterministic Trade Execution</title>
      <dc:creator>KingGyu</dc:creator>
      <pubDate>Sat, 02 May 2026 18:48:12 +0000</pubDate>
      <link>https://forem.com/kinggyusuh/autarch-ai-strategy-evolution-deterministic-trade-execution-3p2f</link>
      <guid>https://forem.com/kinggyusuh/autarch-ai-strategy-evolution-deterministic-trade-execution-3p2f</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Disclaimer: this is research and architecture software, not financial advice. The bundled strategies are not a profitability claim. Cryptocurrency trading involves significant risk.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Autarch is an open-source Bybit USDT perpetual trading workbench built around one boundary:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"LLM trading" does not have to mean an LLM presses Buy or Sell. It can mean an LLM evolves future strategy while deterministic code owns live execution.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude/Codex agents generate, review, backtest, and rank strategy candidates. A Python asyncio runner executes selected strategy files with no LLM calls in the live loop. The handoff is visible through strategy manifests, signal code, leaderboards, active/next pointers, cached data, and append-only evidence logs.&lt;/p&gt;

&lt;p&gt;GitHub:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/KingGyuSuh/autarch" rel="noopener noreferrer"&gt;https://github.com/KingGyuSuh/autarch&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Built It
&lt;/h2&gt;

&lt;p&gt;Most "AI trading bot" framing collapses two very different jobs into one box.&lt;/p&gt;

&lt;p&gt;In older discretionary trading, a trader might personally decide and execute each order. In modern quant trading, that is usually not the shape of the work. Humans design policies, define constraints, validate assumptions, monitor behavior, and let execution systems place trades under those rules.&lt;/p&gt;

&lt;p&gt;That distinction matters for LLMs.&lt;/p&gt;

&lt;p&gt;If an LLM participates in a trading system, the interesting question is not "can the model press the Buy button?" The more useful question is: can the model help evolve future policy without sitting inside the live execution path?&lt;/p&gt;

&lt;p&gt;Research work benefits from generative systems. They can inspect evidence, form hypotheses, critique candidate strategies, compare backtests, and revise code.&lt;/p&gt;

&lt;p&gt;Live execution has a different job. It should be explicit, bounded, inspectable, and boring.&lt;/p&gt;

&lt;p&gt;Autarch is my attempt to preserve both truths in one architecture.&lt;/p&gt;

&lt;p&gt;The project started from a simple design rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Let AI improve the strategy, but do not let generative uncertainty directly own irreversible execution.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That rule shaped the whole repository. Autarch is not "an LLM that trades for you." It is a workbench where AI agents can propose, review, and backtest strategies, while live execution stays deterministic, inspectable, and bounded by explicit risk controls.&lt;/p&gt;

&lt;p&gt;System paper:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/KingGyuSuh/autarch/blob/main/docs/AUTARCH.md" rel="noopener noreferrer"&gt;https://github.com/KingGyuSuh/autarch/blob/main/docs/AUTARCH.md&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Generative models are useful in research loops. They can search through hypotheses, explain tradeoffs, inspect logs, write candidate code, compare results, and critique their own work.&lt;/p&gt;

&lt;p&gt;Live execution has different needs. It benefits from narrow responsibility, explicit state, deterministic behavior, and clear authority boundaries.&lt;/p&gt;

&lt;p&gt;Those two qualities should not be forced into the same runtime path.&lt;/p&gt;

&lt;p&gt;In a trading system, that distinction matters. A model can be useful for strategy evolution without being allowed to improvise in the live order path.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Autarch Split
&lt;/h2&gt;

&lt;p&gt;Autarch is organized into two planes with an evidence boundary between them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evolution Plane
&lt;/h3&gt;

&lt;p&gt;The Evolution Plane is where Claude/Codex harness agents work.&lt;/p&gt;

&lt;p&gt;In the current implementation, the harness runs producer/reviewer pairs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;trade-strategy&lt;/code&gt; creates or revises strategy candidates.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;backtest&lt;/code&gt; evaluates the strategy pool against cached market data.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;strategy-run&lt;/code&gt; compares the top leaderboard candidate against the currently active strategy and writes a proposed next strategy pointer when appropriate.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This side is allowed to be creative and iterative because it does not place live orders. It produces artifacts that can be inspected.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evidence Boundary
&lt;/h3&gt;

&lt;p&gt;The handoff is deliberately plain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;strategy/pool/&amp;lt;id&amp;gt;/manifest.toml
strategy/pool/&amp;lt;id&amp;gt;/signal.py
strategy/leaderboard.toml
strategy-script/active/&amp;lt;pair&amp;gt;.toml
strategy-script/next/&amp;lt;pair&amp;gt;.toml
config/trade.toml
data/*.jsonl
raw-data/*.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These files answer the questions that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What strategy exists?&lt;/li&gt;
&lt;li&gt;Which strategy is active?&lt;/li&gt;
&lt;li&gt;Which strategy is proposed next?&lt;/li&gt;
&lt;li&gt;Why was it ranked highly?&lt;/li&gt;
&lt;li&gt;What evidence has the runner recorded?&lt;/li&gt;
&lt;li&gt;What risk posture is currently configured?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The boundary could become a database, a queue, a signed manifest, or a dashboard later. The important part is not the medium. The important part is that the handoff is visible and accountable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Execution Plane
&lt;/h3&gt;

&lt;p&gt;The Execution Plane is deterministic Python.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;strategy-script/runner.py&lt;/code&gt; runs one asyncio coroutine per configured pair. Each coroutine:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Checks current positions.&lt;/li&gt;
&lt;li&gt;Waits for native TP/SL closure if a position is already open.&lt;/li&gt;
&lt;li&gt;Records closure evidence.&lt;/li&gt;
&lt;li&gt;Applies a pending &lt;code&gt;next/&amp;lt;pair&amp;gt;.toml&lt;/code&gt; strategy pointer only at the boundary.&lt;/li&gt;
&lt;li&gt;Loads the active strategy manifest and &lt;code&gt;signal.py&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Fetches Bybit klines.&lt;/li&gt;
&lt;li&gt;Evaluates &lt;code&gt;entry_signal(candles, params, context)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Routes any entry through &lt;code&gt;bybit-script/place_order.py&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The runner does not call an LLM.&lt;/p&gt;

&lt;p&gt;After entry, the position is managed by Bybit native TP/SL. The runner polls, records evidence, and continues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Strategy Format
&lt;/h2&gt;

&lt;p&gt;Each strategy has a manifest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ema_cross_v1"&lt;/span&gt;
&lt;span class="py"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"..."&lt;/span&gt;
&lt;span class="py"&gt;pairs&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"BTCUSDT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ETHUSDT"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;leverage&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="py"&gt;tp_pct&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.012&lt;/span&gt;
&lt;span class="py"&gt;sl_pct&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.008&lt;/span&gt;
&lt;span class="py"&gt;timeframe&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"5"&lt;/span&gt;
&lt;span class="py"&gt;kline_limit&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;

&lt;span class="nn"&gt;[params]&lt;/span&gt;
&lt;span class="c"&gt;# strategy-specific parameters&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And a deterministic signal function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;entry_signal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candles&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Return None for no entry.
&lt;/span&gt;    &lt;span class="c1"&gt;# Return {"side": "Buy" or "Sell", "rationale": "..."} for an entry.
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The signal code is constrained. It should be deterministic for identical inputs. It should not perform network calls, external IO, or time-dependent behavior. The live runner should evaluate strategy logic, not host a hidden research session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Risk Gates
&lt;/h2&gt;

&lt;p&gt;The project keeps safety posture in &lt;code&gt;config/trade.toml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The default configuration includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;armed = false&lt;/code&gt;, so live order placement is rejected until explicitly enabled&lt;/li&gt;
&lt;li&gt;mandatory TP/SL&lt;/li&gt;
&lt;li&gt;leverage caps&lt;/li&gt;
&lt;li&gt;minimum TP and SL distance floors&lt;/li&gt;
&lt;li&gt;minimum reward/risk ratio&lt;/li&gt;
&lt;li&gt;fixed margin fraction&lt;/li&gt;
&lt;li&gt;global maximum concurrent positions&lt;/li&gt;
&lt;li&gt;active pair list&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The harness never calls &lt;code&gt;place_order.py&lt;/code&gt;. Only the execution runner places entries, and only through the configured order gate.&lt;/p&gt;

&lt;p&gt;This does not make trading safe. It makes the authority boundary explicit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Architecture Matters
&lt;/h2&gt;

&lt;p&gt;The point of Autarch is not that one strategy, exchange, or scoring formula is correct.&lt;/p&gt;

&lt;p&gt;The point is the shape of the system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Let the creative layer mutate future policy.&lt;/li&gt;
&lt;li&gt;Make the policy handoff inspectable.&lt;/li&gt;
&lt;li&gt;Keep the action layer narrow and deterministic.&lt;/li&gt;
&lt;li&gt;Record evidence so the next evolution cycle can learn from what happened.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That pattern applies beyond trading. Any agentic system that separates "thinking about future behavior" from "taking irreversible action" can benefit from a similar boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Want Feedback On
&lt;/h2&gt;

&lt;p&gt;I am especially interested in critique around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;whether the Evolution Plane / Execution Plane split is clear enough&lt;/li&gt;
&lt;li&gt;whether file-based handoffs are a good first boundary&lt;/li&gt;
&lt;li&gt;whether strategy adoption should require stronger review or signatures&lt;/li&gt;
&lt;li&gt;how to score a changing strategy pool without overfitting recent data&lt;/li&gt;
&lt;li&gt;where human approval should sit in the loop&lt;/li&gt;
&lt;li&gt;what should be made more formally verifiable&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;GitHub:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/KingGyuSuh/autarch" rel="noopener noreferrer"&gt;https://github.com/KingGyuSuh/autarch&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Architecture note:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/KingGyuSuh/autarch/blob/main/docs/AUTARCH.md" rel="noopener noreferrer"&gt;https://github.com/KingGyuSuh/autarch/blob/main/docs/AUTARCH.md&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Autarch is research software. It is not financial advice, not a profitability claim, and not something anyone should run blindly. The useful idea is the boundary: evolve freely, hand off explicitly, execute accountably.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cryptocurrency</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Bridging Codex’s image_gen tool into Claude Code as /codex-image:* skills</title>
      <dc:creator>KingGyu</dc:creator>
      <pubDate>Sat, 02 May 2026 18:15:36 +0000</pubDate>
      <link>https://forem.com/kinggyusuh/bridging-codexs-imagegen-tool-into-claude-code-as-codex-image-skills-5d72</link>
      <guid>https://forem.com/kinggyusuh/bridging-codexs-imagegen-tool-into-claude-code-as-codex-image-skills-5d72</guid>
      <description>&lt;p&gt;Claude Code has no first-party image generation. Codex CLI does — it ships a headless &lt;code&gt;image_gen&lt;/code&gt; tool (gpt-image-2) that runs against whatever auth you already have: ChatGPT subscription (Free tier included), or your existing OpenAI API key. So no extra &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; to manage.&lt;/p&gt;

&lt;p&gt;I built a thin Claude Code plugin that bridges the two. Three slash commands:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/codex-image:generate "5 logo variations of a brass compass on white, save under images/logos/"
/codex-image:edit input.png "Replace background with a clean white studio backdrop"
/codex-image:status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The full slash-command argument is passed verbatim to Codex's &lt;code&gt;imagegen&lt;/code&gt; skill. Output paths, sizes, quality, transparency, multi-image count — all expressed in natural language inside the prompt. No &lt;code&gt;--out&lt;/code&gt; / &lt;code&gt;--size&lt;/code&gt; / &lt;code&gt;--quality&lt;/code&gt; flags to memorize; &lt;code&gt;imagegen&lt;/code&gt; handles them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture&lt;/strong&gt;: each &lt;a href="http://SKILL.md" rel="noopener noreferrer"&gt;SKILL.md&lt;/a&gt; is a 1-line &lt;code&gt;node script.mjs &amp;lt;subcmd&amp;gt; "$ARGUMENTS"&lt;/code&gt; invocation. The Node wrapper (~375 lines) does only argument splitting and &lt;code&gt;codex exec&lt;/code&gt; spawning with a ~6-line minimal instruction prefix. Image-generation intelligence lives entirely in Codex's bundled &lt;code&gt;imagegen&lt;/code&gt; skill — this plugin is a pure dispatcher. One non-obvious finding documented along the way: &lt;a href="http://SKILL.md" rel="noopener noreferrer"&gt;SKILL.md&lt;/a&gt; bash isn't always executed verbatim by the model (it pre-evaluates &lt;code&gt;$(...)&lt;/code&gt; substitutions in its head), so all parsing must live in the Node script. Details in &lt;code&gt;docs/ARCHITECTURE.md&lt;/code&gt; if you're building plugins yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off worth knowing&lt;/strong&gt;: agent tokens count against your Codex usage limit. A typical single-image low-quality turn is around 30k agent tokens on top of the image-gen cost itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo&lt;/strong&gt;: &lt;a href="https://github.com/KingGyuSuh/codex-image-in-cc" rel="noopener noreferrer"&gt;https://github.com/KingGyuSuh/codex-image-in-cc&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Install&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude plugin marketplace add KingGyuSuh/codex-image-in-cc
claude plugin install codex-image@codex-image-in-cc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Apache-2.0. Orthogonal to and complementary with &lt;code&gt;openai/codex-plugin-cc&lt;/code&gt; (code review / task delegation under the &lt;code&gt;/codex:&lt;/code&gt; namespace) — install both.&lt;/p&gt;

&lt;p&gt;Happy to take feedback or contributions. The architecture decisions are documented openly so you can disagree concretely.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>cli</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Open-sourcing my personal AI Agent Harness for Production (harness-loom)</title>
      <dc:creator>KingGyu</dc:creator>
      <pubDate>Mon, 20 Apr 2026 13:11:03 +0000</pubDate>
      <link>https://forem.com/kinggyusuh/open-sourcing-my-personal-ai-agent-harness-for-production-harness-loom-3mob</link>
      <guid>https://forem.com/kinggyusuh/open-sourcing-my-personal-ai-agent-harness-for-production-harness-loom-3mob</guid>
      <description>&lt;p&gt;I’ve been poking at a bunch of AI agent frameworks and coding tools this past year. For personal projects, I often just use Hermes Agent or something similar because it's fast and saves tokens.&lt;/p&gt;

&lt;p&gt;But honestly? When I actually have to ship something for &lt;strong&gt;production&lt;/strong&gt;, I can't just use those raw agent setups. Between security compliance, instability, and the sheer complexity of real-world codebases, it’s just too risky.&lt;/p&gt;

&lt;p&gt;For production, I keep going back to CLI tools like &lt;strong&gt;Claude Code, Codex, or Gemini CLI&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;Why? Because in production:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Perfect &amp;gt; Fast:&lt;/strong&gt; I'd rather it take longer but be absolutely correct and secure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traceability &amp;amp; Long Plans:&lt;/strong&gt; I need to track the exact progress of long-running plans without having to baby-sit it or intervene constantly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistent Quality:&lt;/strong&gt; No matter which team member kicks off the task, the output quality and adherence to our repo's standards need to be exactly the same.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And I realized the way to achieve this isn't by finding a magical new model. It's by &lt;strong&gt;tuning the harness&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;These CLIs (Claude, Codex, Gemini) already give you a pretty solid baseline harness for free (planners, hooks, auto mode, skills). But that baseline has no idea what &lt;em&gt;my&lt;/em&gt; specific repo cares about. It doesn't know my team's review rules, what "Done" looks like for us, or what artifacts we need to persist.&lt;/p&gt;

&lt;p&gt;So, I started focusing on &lt;strong&gt;Harness Fine-Tuning&lt;/strong&gt;—writing my team's specific review rules, producer/reviewer pairs, and task shapes into actual version-controlled files, rather than trying to re-explain them in a prompt every single session.&lt;/p&gt;

&lt;p&gt;I've finally open-sourced my personal harness setup: &lt;strong&gt;&lt;a href="https://github.com/KingGyuSuh/harness-loom" rel="noopener noreferrer"&gt;harness-loom&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It’s not another agent framework. It sits &lt;em&gt;on top&lt;/em&gt; of whatever harness your CLI already ships and lets you shape it to fit your production repo. You define your rules in one canonical place (&lt;code&gt;.harness/loom/&lt;/code&gt;), and it derives the specific configs for Claude, Codex, or Gemini.&lt;/p&gt;

&lt;p&gt;I’m still in the process of porting over all the specific features from my private setup into the open-source repo, but the core factory is there and ready to use. I'll be updating it quickly!&lt;/p&gt;

&lt;p&gt;If you are trying to use AI assistants for serious production work and want them to act more like a predictable system rather than a one-off chat, I'd love for you to poke at it.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;GitHub Repo:&lt;/strong&gt; &lt;a href="https://github.com/KingGyuSuh/harness-loom" rel="noopener noreferrer"&gt;harness-loom&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Has anyone else felt the need to shift from "prompt engineering" to "harness engineering" for production work?&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>opensource</category>
      <category>tooling</category>
    </item>
  </channel>
</rss>
