<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: daac-solo</title>
    <description>The latest articles on Forem by daac-solo (@daacsolo).</description>
    <link>https://forem.com/daacsolo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3848494%2Fdb6efb92-1fc8-4586-9c9d-5d257fac259f.png</url>
      <title>Forem: daac-solo</title>
      <link>https://forem.com/daacsolo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/daacsolo"/>
    <language>en</language>
    <item>
      <title>Trust, Philosophy, Context, Harness — How My AI Usage Evolved</title>
      <dc:creator>daac-solo</dc:creator>
      <pubDate>Sun, 29 Mar 2026 01:44:12 +0000</pubDate>
      <link>https://forem.com/daacsolo/trust-philosophy-context-harness-four-levels-of-working-with-ai-4md5</link>
      <guid>https://forem.com/daacsolo/trust-philosophy-context-harness-four-levels-of-working-with-ai-4md5</guid>
      <description>&lt;p&gt;&lt;em&gt;What I learned building 6 autonomous agents, a knowledge engine, and a multi-model workflow.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I used to use AI as a chat assistant. Today I have 6 autonomous agents that monitor my systems every morning before I wake up.&lt;/p&gt;

&lt;p&gt;Looking back, the journey had four distinct phases — each building on the last. This isn't a universal framework; it's just how things evolved for me. But I think the progression is natural enough that it might resonate.&lt;/p&gt;

&lt;p&gt;Everything below is something I actually built and use daily.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 1: Trust — "It Starts with Believing"
&lt;/h2&gt;

&lt;p&gt;The biggest barrier to using AI isn't technical. It's psychological.&lt;/p&gt;

&lt;p&gt;Most people try AI once, get a mediocre answer, and conclude "AI isn't useful for my work." They're using it like a search engine — ask a question, get an answer, done.&lt;/p&gt;

&lt;p&gt;The breakthrough is a mental shift: &lt;strong&gt;treat AI as a collaborator, not an oracle.&lt;/strong&gt; It has unlimited patience, broad knowledge, and tireless iteration. You have judgment, domain expertise, and taste. Together you're stronger than either alone.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this looks like in practice
&lt;/h3&gt;

&lt;p&gt;I don't say "write me a SQL query for conversion rate." I say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Our conversion metric spiked 2pp in the treatment arm. I need to understand — is that from more people entering the funnel, or the same people converting more often? Let's figure this out."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI asks clarifying questions. Proposes two hypotheses. We iterate on diagnostic queries together. Three rounds later, we find a root cause I wouldn't have found alone.&lt;/p&gt;

&lt;p&gt;That's not "using a tool." That's collaborating.&lt;/p&gt;

&lt;h3&gt;
  
  
  The takeaway for beginners
&lt;/h3&gt;

&lt;p&gt;Pick a real problem you're stuck on. Not a toy task — something where you'll notice if the answer is wrong. The trust builds when you see it actually help with something hard. Nobody trusts a stranger by reading their resume. You trust them by working through a tough problem together.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 2: Prompt Engineering — "It's All Philosophy"
&lt;/h2&gt;

&lt;p&gt;Here's the dirty secret about prompt engineering: the people who are good at it aren't using magic words. They're thinking clearly before they ask.&lt;/p&gt;

&lt;p&gt;The prompt engineering community has produced thousands of templates — "chain of thought," "few-shot," "role-playing." These work, but they don't scale. When the situation changes, the template doesn't adapt.&lt;/p&gt;

&lt;p&gt;What does adapt? &lt;strong&gt;Thinking frameworks.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Two philosophical tools
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Socratic questioning — clarify before you ask&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before writing any prompt, answer these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What problem am I actually solving?" (not "what prompt should I write?")&lt;/li&gt;
&lt;li&gt;"What would have to be true for this approach to work?"&lt;/li&gt;
&lt;li&gt;"What's the strongest counterargument?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This sounds obvious. But most bad prompts come from unclear thinking, not bad phrasing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. First principles + Occam's razor — solve the real problem simply&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start from the goal, not from convention ("how was this done before?" is usually the wrong question)&lt;/li&gt;
&lt;li&gt;Prefer the simplest solution that fully solves the problem&lt;/li&gt;
&lt;li&gt;Don't add complexity for hypothetical future needs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How I encoded this
&lt;/h3&gt;

&lt;p&gt;I wrote these into my AI assistant's system instructions — a file called &lt;code&gt;CLAUDE.md&lt;/code&gt; that loads on every session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Thinking Defaults&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; First principles: ask "what problem are we solving?" before
  "how was this done before?"
&lt;span class="p"&gt;-&lt;/span&gt; Occam's razor: prefer the simplest solution that fully solves
  the problem
&lt;span class="p"&gt;-&lt;/span&gt; Socratic questioning: probe deeper before accepting the frame.
  Ask "why does this matter?", "what would have to be true?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These aren't prompts. They're &lt;strong&gt;thinking instructions&lt;/strong&gt; — they apply to every task, every domain. I wrote them once and they've shaped hundreds of interactions.&lt;/p&gt;

&lt;h3&gt;
  
  
  The takeaway
&lt;/h3&gt;

&lt;p&gt;Learn 3 questions instead of 300 prompts. Philosophy scales; templates don't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 3: Context Engineering — "Make the AI Know What You Know"
&lt;/h2&gt;

&lt;p&gt;A single prompt is a one-shot. Context engineering is about &lt;strong&gt;systematically giving AI the right information at the right time&lt;/strong&gt; — across sessions, across tasks, across models.&lt;/p&gt;

&lt;p&gt;This is where the real leverage is. Karpathy was right — the new skill in AI is context engineering, not prompt engineering.&lt;/p&gt;

&lt;p&gt;I built three pillars:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pillar 1: Progressive Disclosure
&lt;/h3&gt;

&lt;p&gt;Don't dump everything into the prompt. Layer it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Always loaded:  CLAUDE.md (thinking rules, voice, hard constraints)
                    |
Task-relevant:  8 domain skill files (experiment analysis,
                SQL gotchas, stakeholder communication...)
                    |
Just-in-time:   auto-search hook — surfaces relevant prior
                findings on every message
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The auto-search hook is the key innovation. I built a semantic search engine over 350+ indexed findings from past work. It runs automatically on every message I send — searches the index, surfaces the top 3 relevant results as additional context. ~1 second latency. I never ask for it. It just works.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# The hook script (simplified)&lt;/span&gt;
&lt;span class="nv"&gt;PROMPT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;extract_from_stdin&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;RESULTS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;ke search &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 3&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;has_relevant_results &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RESULTS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;additionalContext&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="nv"&gt;$RESULTS&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The insight:&lt;/strong&gt; AI doesn't need everything. It needs the right things at the right time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pillar 2: Memory Management
&lt;/h3&gt;

&lt;p&gt;Knowledge flows through three durability levels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Raw observation  --&amp;gt;  Indexed finding  --&amp;gt;  Skill file entry
  (one session)      (searchable)          (permanent)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After every substantive task — debugging sessions, data analysis, root cause investigations — the AI auto-captures findings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ke add &lt;span class="s2"&gt;"sql-gotchas"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"LATERAL VIEW EXPLODE can't be followed by LEFT JOIN in same FROM"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-t&lt;/span&gt; &lt;span class="s2"&gt;"databricks,sql,parse-error"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"debugging session 2026-03-15"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Findings that prove useful across multiple sessions get promoted to permanent skill files. The system evolves organically through real work, not batch maintenance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pillar 3: Multi-Model Workflows
&lt;/h3&gt;

&lt;p&gt;Different models have different strengths. I built a pipeline that uses three:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;/tri-ai-workflow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="s"&gt;Claude plans  --&amp;gt;  Codex codes  --&amp;gt;  Gemini reviews&lt;/span&gt;
  &lt;span class="s"&gt;(architecture)    (implementation)   (adversarial review)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And a completion gate that forces the right quality check based on task type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;/done triggers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="s"&gt;SQL task     --&amp;gt; 9-point data sanity checklist&lt;/span&gt;
  &lt;span class="s"&gt;Experiment   --&amp;gt; 7-point analysis checklist&lt;/span&gt;
  &lt;span class="s"&gt;ETL pipeline --&amp;gt; 8-point review checklist&lt;/span&gt;
  &lt;span class="s"&gt;Code         --&amp;gt; Gemini review score &amp;gt;= &lt;/span&gt;&lt;span class="m"&gt;90&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No task gets declared "done" without passing its gate.&lt;/p&gt;

&lt;h3&gt;
  
  
  The takeaway
&lt;/h3&gt;

&lt;p&gt;Context engineering is workflow design. It's your expertise, encoded so AI can participate at your level.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 4: Harness Engineering — "Build Stable Systems, Not Clever Prompts"
&lt;/h2&gt;

&lt;p&gt;This is where I am now. AI moves from "tool I use" to "system that runs." The hard problems are no longer prompting — they're &lt;strong&gt;reliability, observability, and failure handling.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I call this "harness engineering" — the engineering discipline of building the scaffolding that makes AI agents work consistently in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I built in 3 days
&lt;/h3&gt;

&lt;p&gt;Six autonomous agents that run every morning:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;What it monitors&lt;/th&gt;
&lt;th&gt;Schedule&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data Freshness&lt;/td&gt;
&lt;td&gt;upstream table staleness (45+ tables)&lt;/td&gt;
&lt;td&gt;6:30 AM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business Metrics&lt;/td&gt;
&lt;td&gt;daily revenue/conversion vs 7d/28d baselines&lt;/td&gt;
&lt;td&gt;7:00 AM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Growth Tracker&lt;/td&gt;
&lt;td&gt;daily migration metrics&lt;/td&gt;
&lt;td&gt;7:15 AM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Experiment Health&lt;/td&gt;
&lt;td&gt;enrollment drift, arm balance, platform coverage&lt;/td&gt;
&lt;td&gt;7:30 AM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Portfolio Monitor&lt;/td&gt;
&lt;td&gt;weekly revenue/impression share WoW + incident detection&lt;/td&gt;
&lt;td&gt;6:00 AM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System Monitor&lt;/td&gt;
&lt;td&gt;ranking input anomalies + impression tracking across 6 surfaces&lt;/td&gt;
&lt;td&gt;9:00 AM&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each agent: reads from a data warehouse, reasons about what it finds, and posts a summary to Slack. If something's anomalous, it flags it with context on why.&lt;/p&gt;

&lt;p&gt;This sounds fancy, but the architecture is boring on purpose.&lt;/p&gt;

&lt;h3&gt;
  
  
  The architecture is deliberately boring
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;prompt file + cron + &lt;span class="sb"&gt;`&lt;/span&gt;claude &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Slack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No LangGraph. No multi-agent orchestration framework. No agent-to-agent communication. Each agent is independent.&lt;/p&gt;

&lt;p&gt;I evaluated two frameworks — one with 32 specialized agent roles, another with full LangGraph orchestration — and rejected both. They solve problems I don't have yet. The simplest architecture that works is the right one (Occam's razor, all the way down).&lt;/p&gt;

&lt;h3&gt;
  
  
  Five lessons from building this
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Environment matters more than prompts.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All agents failed on day 1. Not because the prompts were wrong — because the cron scheduler's config was missing environment variables for cloud authentication. The prompt was perfect; the harness was broken.&lt;/p&gt;

&lt;p&gt;This is the defining insight of harness engineering: &lt;strong&gt;most failures aren't prompt failures. They're infrastructure failures.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Domain knowledge prevents false positives.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My system monitor checks for anomalies in ranking inputs. But some values are intentionally static — they're hardcoded overrides, not anomalies. Without encoding these known values in the prompt, every run produced false alerts.&lt;/p&gt;

&lt;p&gt;The prompt needs to know what "normal" looks like. That's domain knowledge, not prompt engineering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Output format = platform constraints.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Markdown tables don't render in Slack API messages. I had to switch to monospace code blocks. The AI generates beautiful markdown — but the harness decides what actually works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Ordering creates reliability without coupling.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The data freshness agent runs at 6:30, before all other agents. If data is stale, downstream agents know to expect gaps. This is dependency ordering without inter-agent communication — simple, robust, no shared state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Start simple, iterate fast.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Version 1 of the system monitor showed only anomalies. V2 added a full feature summary table. V3 added impression tracking across all 6 surfaces with per-surface breakdown. Each version shipped the same day, informed by the previous version's gaps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Open questions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cross-run memory:&lt;/strong&gt; agents don't remember yesterday's findings. How do you give them session persistence without overcomplicating the architecture?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failure alerting:&lt;/strong&gt; if an agent errors, it just... doesn't post. You notice the absence. There's no alerting on the alerter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inter-agent signals:&lt;/strong&gt; when one agent detects an anomaly, can another agent use that signal as context? Without building a message bus?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The takeaway
&lt;/h3&gt;

&lt;p&gt;Harness engineering is about making AI &lt;strong&gt;reliable&lt;/strong&gt;, not &lt;strong&gt;clever&lt;/strong&gt;. Environment setup, scheduling, false positive suppression, output formatting — that's where the real work is.&lt;/p&gt;




&lt;h2&gt;
  
  
  Looking Back
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Harness Engineering  -- stable autonomous systems
Context Engineering  -- right info, right time, across sessions
Prompt Engineering   -- philosophical clarity before asking
Trust                -- believing you can solve problems together
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each phase built on the previous. I couldn't have invested in context engineering without trusting AI enough to build a workflow around it. And harness engineering wouldn't work without the context layer making outputs reliable.&lt;/p&gt;

&lt;p&gt;This was my path — yours might look different. But I think the layers stack naturally. You don't graduate from one to the next; you keep using all of them.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm currently exploring the open questions in harness engineering — if you're building autonomous agents and have solved cross-run memory or failure recovery, I'd love to hear your approach.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>contextengineering</category>
      <category>productivity</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
