<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Jaewon Jang</title>
    <description>The latest articles on Forem by Jaewon Jang (@jaewon_jang_d63fddcf69ac2).</description>
    <link>https://forem.com/jaewon_jang_d63fddcf69ac2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3855130%2F7aab33c6-d7e2-4ece-8084-7d55a696de88.png</url>
      <title>Forem: Jaewon Jang</title>
      <link>https://forem.com/jaewon_jang_d63fddcf69ac2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/jaewon_jang_d63fddcf69ac2"/>
    <language>en</language>
    <item>
      <title>LLM agents don't degrade gradually — they cliff-edge. I built HarnessOS to survive it</title>
      <dc:creator>Jaewon Jang</dc:creator>
      <pubDate>Wed, 01 Apr 2026 15:04:10 +0000</pubDate>
      <link>https://forem.com/jaewon_jang_d63fddcf69ac2/harnessos-scaffoldmiddleware-for-infinite-autonomous-tasks-built-on-harness-engineering-3pf1</link>
      <guid>https://forem.com/jaewon_jang_d63fddcf69ac2/harnessos-scaffoldmiddleware-for-infinite-autonomous-tasks-built-on-harness-engineering-3pf1</guid>
      <description>&lt;p&gt;There's a concept gaining traction in AI systems engineering: &lt;strong&gt;Harness Engineering&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not the testing tool. The idea: raw LLM capability is like raw power — high voltage,&lt;br&gt;
hard to control, dangerous to run indefinitely. Harness Engineering is the discipline of&lt;br&gt;
building the control structures that make that power &lt;em&gt;usable at scale&lt;/em&gt;.&lt;br&gt;
Context managers. Evaluation loops. Failure classifiers. Goal trackers. Memory tiers.&lt;/p&gt;

&lt;p&gt;I think it's going to be one of the defining disciplines of serious AI systems work.&lt;br&gt;
And I've been building a platform around it.&lt;/p&gt;


&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;HarnessOS&lt;/strong&gt; is a scaffold/middleware system for running infinite autonomous tasks.&lt;/p&gt;

&lt;p&gt;The key word is &lt;em&gt;infinite&lt;/em&gt;. Not one task. Not one session. An agent that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs continuously, across context window rotations&lt;/li&gt;
&lt;li&gt;Evolves its own goals when it succeeds at the current one&lt;/li&gt;
&lt;li&gt;Persists state across sessions without losing context&lt;/li&gt;
&lt;li&gt;Classifies its own failures and routes them appropriately&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HarnessOS
├── CTX                      ← context precision layer
│   └── LLM-free retrieval, 5.2% token budget, R@5=1.0 dependency recall
├── omc-live                 ← finite outer loop
│   └── 2-Wave strategy + self-evolving goals + episode memory
├── omc-live-infinite        ← infinite outer loop
│   └── context rotation, world model, no iteration cap
├── HalluMaze                ← hallucination management (in development)
└── [future layers]
    ├── Evaluation Layer
    ├── Safety Layer
    └── Memory Tier System
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Problem with Current Agent Frameworks
&lt;/h2&gt;

&lt;p&gt;Most agent frameworks are built for tasks that complete in one session.&lt;/p&gt;

&lt;p&gt;Spin up → run → done.&lt;/p&gt;

&lt;p&gt;That's fine for demos. It breaks for real autonomous work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context exhaustion&lt;/strong&gt;: At ~70% context capacity, agents start losing earlier decisions.&lt;br&gt;
Not gracefully. They cliff-edge — sudden degradation, not gradual fade.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No goal evolution&lt;/strong&gt;: An agent that succeeds at "write tests" has no mechanism to&lt;br&gt;
ask "what's the next improvement?" It just stops.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Failure is terminal&lt;/strong&gt;: Most frameworks catch exceptions. Few &lt;em&gt;classify&lt;/em&gt; them —&lt;br&gt;
transient vs persistent vs fundamental goal mismatch.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;HarnessOS is built specifically to address all three.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Measured (The Empirical Foundation)
&lt;/h2&gt;

&lt;p&gt;Before building anything, I ran controlled experiments on questions I couldn't find&lt;br&gt;
good empirical answers to anywhere else.&lt;/p&gt;
&lt;h3&gt;
  
  
  Q1: How should autonomous agents reason about problems?
&lt;/h3&gt;

&lt;p&gt;Compared &lt;strong&gt;hypothesis-driven debugging&lt;/strong&gt; (observe → hypothesize → verify)&lt;br&gt;
against &lt;strong&gt;engineering-only&lt;/strong&gt; (pattern match → retry) on 12 bug scenarios.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bug type&lt;/th&gt;
&lt;th&gt;Engineering&lt;/th&gt;
&lt;th&gt;Hypothesis&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple&lt;/td&gt;
&lt;td&gt;1.0 attempts&lt;/td&gt;
&lt;td&gt;1.0 attempts&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Causal&lt;/td&gt;
&lt;td&gt;1.75 attempts&lt;/td&gt;
&lt;td&gt;1.0 attempts&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-43%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Assumption&lt;/td&gt;
&lt;td&gt;2.0 attempts&lt;/td&gt;
&lt;td&gt;1.0 attempts&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-50%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;First-hypothesis accuracy: &lt;strong&gt;100%&lt;/strong&gt;. This is now the default reasoning strategy in omc-live.&lt;/p&gt;
&lt;h3&gt;
  
  
  Q2: Where do context limits actually hit?
&lt;/h3&gt;

&lt;p&gt;Measured Lost-in-the-Middle across 1K/10K/50K/100K token contexts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key finding: degradation is threshold-based, not gradual.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agents don't slowly forget. They cliff-edge at a specific token length and fail silently.&lt;br&gt;
This changed how &lt;code&gt;omc-live-infinite&lt;/code&gt; handles context — it monitors budget and triggers&lt;br&gt;
a safe rotation handoff at 70%, before the cliff.&lt;/p&gt;
&lt;h3&gt;
  
  
  Q3: Where do autonomous agents actually fail?
&lt;/h3&gt;

&lt;p&gt;OpenHands on 20-step coding tasks. Failure clusters:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Wrong task decomposition (incorrect sub-goals from the start)&lt;/li&gt;
&lt;li&gt;Role non-compliance (agent exceeds defined scope)&lt;/li&gt;
&lt;li&gt;Boundary violations (unexpected state mutations)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Predictable = preventable. The omc-failure-router classifies failures into these&lt;br&gt;
categories and routes them appropriately instead of generic retry.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Architecture in Practice
&lt;/h2&gt;
&lt;h3&gt;
  
  
  omc-live: Finite Self-Evolving Loop
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Wave 1: Strategy consultation (specialist agents, runs once)
   ↓
Wave 2: Execution loop
   ↓
Judgment: Goal achieved?
   ├── NO  → update goal tree, retry
   └── YES → Score (5 dimensions)
                ├── delta ≥ epsilon → EVOLVE goal, continue
                └── plateau × 3    → CONVERGED, stop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;When the system succeeds, it scores the output, finds the weakest dimension,&lt;br&gt;
generates an elevated goal, and continues — until quality plateaus.&lt;/p&gt;
&lt;h3&gt;
  
  
  omc-live-infinite: No Iteration Cap
&lt;/h3&gt;

&lt;p&gt;New mechanisms beyond the finite version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context rotation&lt;/strong&gt;: at 70% budget → save state → fresh session → resume&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;World model&lt;/strong&gt;: epistemic state layer that persists across rotations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Co-evolution feedback&lt;/strong&gt;: strategy outcomes feed back into Wave 1 planning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Enables agents that work on complex goals for hours, not seconds.&lt;/p&gt;
&lt;h3&gt;
  
  
  CTX: Precision Context Loading
&lt;/h3&gt;

&lt;p&gt;Query classification → retrieval strategy selection:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EXPLICIT_SYMBOL → direct lookup&lt;/li&gt;
&lt;li&gt;SEMANTIC_FUNCTIONALITY → embedding search&lt;/li&gt;
&lt;li&gt;STRUCTURAL_RELATIONSHIP → dependency graph&lt;/li&gt;
&lt;li&gt;RECENT_CHANGE → git recency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: 5.2% average token budget, R@5=1.0. No LLM calls for retrieval.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why "Harness Engineering" Is the Right Frame
&lt;/h2&gt;

&lt;p&gt;A harness doesn't constrain power — it &lt;em&gt;channels&lt;/em&gt; it.&lt;/p&gt;

&lt;p&gt;LLMs have enormous capability. Without control structure, that capability is:&lt;br&gt;
context-unaware, goal-unstable, failure-opaque, session-local.&lt;/p&gt;

&lt;p&gt;HarnessOS adds the control structure. Not to limit the model — to make it usable&lt;br&gt;
for work that spans hours, not seconds.&lt;/p&gt;


&lt;h2&gt;
  
  
  Current State &amp;amp; Quick Start
&lt;/h2&gt;

&lt;p&gt;214 tests, 100% coverage. CTX and omc-live/infinite are stable and used daily.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/jaytoone/HarnessOS
python3 analyze.py &lt;span class="nt"&gt;--run&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No pip install. No required API keys for base experiments.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/jaytoone/HarnessOS" rel="noopener noreferrer"&gt;https://github.com/jaytoone/HarnessOS&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're building autonomous agents and thinking about long-run reliability — happy to compare notes.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
    <item>
      <title>HarnessOS: scaffold/middleware for infinite autonomous tasks — built on Harness Engineering</title>
      <dc:creator>Jaewon Jang</dc:creator>
      <pubDate>Wed, 01 Apr 2026 08:27:45 +0000</pubDate>
      <link>https://forem.com/jaewon_jang_d63fddcf69ac2/harnessos-scaffoldmiddleware-for-infinite-autonomous-tasks-built-on-harness-engineering-50n0</link>
      <guid>https://forem.com/jaewon_jang_d63fddcf69ac2/harnessos-scaffoldmiddleware-for-infinite-autonomous-tasks-built-on-harness-engineering-50n0</guid>
      <description>&lt;p&gt;There's a concept gaining traction in AI systems engineering: &lt;strong&gt;Harness Engineering&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not the testing tool. The idea: raw LLM capability is like raw power — high voltage, hard to control, dangerous to run indefinitely. Harness Engineering is the discipline of building the control structures that make that power &lt;em&gt;usable at scale&lt;/em&gt;.&lt;br&gt;
Context managers. Evaluation loops. Failure classifiers. Goal trackers. Memory tiers.&lt;/p&gt;

&lt;p&gt;I think it's going to be one of the defining disciplines of serious AI systems work.&lt;br&gt;
And I've been building a platform around it.&lt;/p&gt;


&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;HarnessOS&lt;/strong&gt; is a scaffold/middleware system for running infinite autonomous tasks.&lt;/p&gt;

&lt;p&gt;The key word is &lt;em&gt;infinite&lt;/em&gt;. Not one task. Not one session. An agent that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs continuously, across context window rotations&lt;/li&gt;
&lt;li&gt;Evolves its own goals when it succeeds at the current one&lt;/li&gt;
&lt;li&gt;Persists state across sessions without losing context&lt;/li&gt;
&lt;li&gt;Classifies its own failures and routes them appropriately&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HarnessOS
├── CTX                      ← context precision layer
│   └── LLM-free retrieval, 5.2% token budget, R@5=1.0 dependency recall
├── omc-live                 ← finite outer loop
│   └── 2-Wave strategy + self-evolving goals + episode memory
├── omc-live-infinite        ← infinite outer loop
│   └── context rotation, world model, no iteration cap
├── HalluMaze                ← hallucination management (in development)
└── [future layers]
    ├── Evaluation Layer
    ├── Safety Layer
    └── Memory Tier System
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Problem with Current Agent Frameworks
&lt;/h2&gt;

&lt;p&gt;Most agent frameworks are built for tasks that complete in one session.&lt;/p&gt;

&lt;p&gt;Spin up → run → done.&lt;/p&gt;

&lt;p&gt;That's fine for demos. It breaks for real autonomous work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context exhaustion&lt;/strong&gt;: At ~70% context capacity, agents start losing earlier decisions. Not gracefully. They cliff-edge — sudden degradation, not gradual fade.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No goal evolution&lt;/strong&gt;: An agent that succeeds at "write tests" has no mechanism to ask "what's the next improvement?" It just stops.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Failure is terminal&lt;/strong&gt;: Most frameworks catch exceptions. Few &lt;em&gt;classify&lt;/em&gt; them — transient vs persistent vs fundamental goal mismatch.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;HarnessOS is built specifically to address all three.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Measured (The Empirical Foundation)
&lt;/h2&gt;

&lt;p&gt;Before building anything, I ran controlled experiments on questions I couldn't find good empirical answers to anywhere else.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q1: How should autonomous agents reason about problems?
&lt;/h3&gt;

&lt;p&gt;Compared &lt;strong&gt;hypothesis-driven debugging&lt;/strong&gt; (observe → hypothesize → verify) against &lt;strong&gt;engineering-only&lt;/strong&gt; (pattern match → retry) on 12 bug scenarios.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bug type&lt;/th&gt;
&lt;th&gt;Engineering&lt;/th&gt;
&lt;th&gt;Hypothesis&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple&lt;/td&gt;
&lt;td&gt;1.0 attempts&lt;/td&gt;
&lt;td&gt;1.0 attempts&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Causal&lt;/td&gt;
&lt;td&gt;1.75 attempts&lt;/td&gt;
&lt;td&gt;1.0 attempts&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-43%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Assumption&lt;/td&gt;
&lt;td&gt;2.0 attempts&lt;/td&gt;
&lt;td&gt;1.0 attempts&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-50%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;First-hypothesis accuracy: &lt;strong&gt;100%&lt;/strong&gt;. This is now the default reasoning strategy in omc-live.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q2: Where do context limits actually hit?
&lt;/h3&gt;

&lt;p&gt;Measured Lost-in-the-Middle across 1K/10K/50K/100K token contexts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key finding: degradation is threshold-based, not gradual.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agents don't slowly forget. They cliff-edge at a specific token length and fail silently.&lt;br&gt;
This changed how &lt;code&gt;omc-live-infinite&lt;/code&gt; handles context — it monitors budget and triggers a safe rotation handoff at 70%, before the cliff.&lt;/p&gt;
&lt;h3&gt;
  
  
  Q3: Where do autonomous agents actually fail?
&lt;/h3&gt;

&lt;p&gt;OpenHands on 20-step coding tasks. Failure clusters:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Wrong task decomposition (incorrect sub-goals from the start)&lt;/li&gt;
&lt;li&gt;Role non-compliance (agent exceeds defined scope)&lt;/li&gt;
&lt;li&gt;Boundary violations (unexpected state mutations)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Predictable = preventable. The omc-failure-router classifies failures into these categories and routes them appropriately instead of generic retry.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Architecture in Practice
&lt;/h2&gt;
&lt;h3&gt;
  
  
  omc-live: Finite Self-Evolving Loop
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Wave 1: Strategy consultation (specialist agents, runs once)
   ↓
Wave 2: Execution loop
   ↓
Judgment: Goal achieved?
   ├── NO  → update goal tree, retry
   └── YES → Score (5 dimensions)
                ├── delta ≥ epsilon → EVOLVE goal, continue
                └── plateau × 3    → CONVERGED, stop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;When the system succeeds, it scores the output, finds the weakest dimension, generates an elevated goal, and continues — until quality plateaus.&lt;/p&gt;
&lt;h3&gt;
  
  
  omc-live-infinite: No Iteration Cap
&lt;/h3&gt;

&lt;p&gt;New mechanisms beyond the finite version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context rotation&lt;/strong&gt;: at 70% budget → save state → fresh session → resume&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;World model&lt;/strong&gt;: epistemic state layer that persists across rotations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Co-evolution feedback&lt;/strong&gt;: strategy outcomes feed back into Wave 1 planning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Enables agents that work on complex goals for hours, not seconds.&lt;/p&gt;
&lt;h3&gt;
  
  
  CTX: Precision Context Loading
&lt;/h3&gt;

&lt;p&gt;Query classification → retrieval strategy selection:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EXPLICIT_SYMBOL → direct lookup&lt;/li&gt;
&lt;li&gt;SEMANTIC_FUNCTIONALITY → embedding search&lt;/li&gt;
&lt;li&gt;STRUCTURAL_RELATIONSHIP → dependency graph&lt;/li&gt;
&lt;li&gt;RECENT_CHANGE → git recency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: 5.2% average token budget, R@5=1.0. No LLM calls for retrieval.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why "Harness Engineering" Is the Right Frame
&lt;/h2&gt;

&lt;p&gt;A harness doesn't constrain power — it &lt;em&gt;channels&lt;/em&gt; it.&lt;/p&gt;

&lt;p&gt;LLMs have enormous capability. Without control structure, that capability is: context-unaware, goal-unstable, failure-opaque, session-local.&lt;/p&gt;

&lt;p&gt;HarnessOS adds the control structure. Not to limit the model — to make it usable for work that spans hours, not seconds.&lt;/p&gt;


&lt;h2&gt;
  
  
  Current State &amp;amp; Quick Start
&lt;/h2&gt;

&lt;p&gt;214 tests, 100% coverage. CTX and omc-live/infinite are stable and used daily.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/jaytoone/HarnessOS
python3 analyze.py &lt;span class="nt"&gt;--run&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No pip install. No required API keys for base experiments.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/jaytoone/HarnessOS" rel="noopener noreferrer"&gt;https://github.com/jaytoone/HarnessOS&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're building autonomous agents and thinking about long-run reliability — happy to compare notes.&lt;/p&gt;

</description>
      <category>aiagentsopensourceproductivity</category>
    </item>
  </channel>
</rss>
