<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Jula Markova</title>
    <description>The latest articles on Forem by Jula Markova (@jula-markova).</description>
    <link>https://forem.com/jula-markova</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3937734%2F17aedaad-5bbb-49c9-b5eb-73450ec3a042.webp</url>
      <title>Forem: Jula Markova</title>
      <link>https://forem.com/jula-markova</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/jula-markova"/>
    <language>en</language>
    <item>
      <title>How to Architect Always-On AI Agents with Hermes - Written by an AI Pipeline, Verified by Three Models. Is It Slop?</title>
      <dc:creator>Jula Markova</dc:creator>
      <pubDate>Thu, 21 May 2026 13:34:52 +0000</pubDate>
      <link>https://forem.com/jula-markova/written-by-an-ai-pipeline-verified-by-three-models-is-it-slop-1i38</link>
      <guid>https://forem.com/jula-markova/written-by-an-ai-pipeline-verified-by-three-models-is-it-slop-1i38</guid>
      <description>&lt;h2&gt;
  
  
  How This Article Was Built (And Why I'm Showing You the Kitchen)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Disclaimer up front:&lt;/strong&gt; I'm not entering the Hermes Agent challenge. I noticed the challenge and realized I could use my AI pipeline to write an article about Hermes Agent architecture. So I did. And thought, why not share both the result and the process that created it? What I actually want is your honest criticism.&lt;/p&gt;

&lt;h3&gt;
  
  
  Who Is The Author?
&lt;/h3&gt;

&lt;p&gt;For the past several months I've been building &lt;a href="https://bestaiweb.ai" rel="noopener noreferrer"&gt;Bestaiweb&lt;/a&gt;, navigating the shift from traditional development to AI. The site runs on Hugo, and the content is generated through what I call an AI content pipeline. The pipeline itself is built in TypeScript, orchestrated through Claude Code, and runs on Anthropic's Claude models. Still in progress.&lt;/p&gt;

&lt;p&gt;That phrase — "AI content pipeline" — probably triggered your slop detector. Fair. Let me explain why I think this case is different, and then let you judge.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Pipeline
&lt;/h3&gt;

&lt;p&gt;BestAIweb currently has 450+ technical articles across 45 topic clusters. Every article goes through a multi-phase pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Market scanning&lt;/strong&gt; — an LLM agent surveys the current tool and framework landscape for each topic, identifying what's leading, what's declining, and what's emerging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query fan-out&lt;/strong&gt; — the pipeline generates the questions a developer would actually search for, not the questions that sound good as headlines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research&lt;/strong&gt; — a dedicated research agent gathers facts, version numbers, benchmark data, and source URLs. Everything gets a structured fact sheet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Writing&lt;/strong&gt; — here's where personas come in. The pipeline has four author personas, each with a distinct voice and content type specialization:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MAX&lt;/strong&gt; — the engineer. Writes step-by-step guides. Pragmatic, implementation-focused, opinionated about tool choices&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MONA&lt;/strong&gt; — the explainer. Breaks down concepts. Thinks in diagrams and mental models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DAN&lt;/strong&gt; — the reporter. Covers news, market shifts, and what just shipped&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ALAN&lt;/strong&gt; — the critic. Writes opinion pieces and ethical assessments&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claim verification&lt;/strong&gt; — a separate agent cross-checks every factual claim against the research fact sheet. Unsupported claims get flagged&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic validation&lt;/strong&gt; — a Python script runs 30+ structural and quality checks: word count, link integrity, frontmatter completeness, source coverage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hugo integration&lt;/strong&gt; — the article lands in the static site with schema.org markup, generated images, and internal links&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Hermes Agent guide below was written by &lt;a href="https://www.bestaiweb.ai/authors/max/" rel="noopener noreferrer"&gt;MAX using his guide template&lt;/a&gt;. His tone of voice is direct, specification-oriented, and allergic to hand-waving. The template enforces a fixed structure: prerequisites, numbered steps, pitfalls table, FAQ, and a deployable artifact at the end.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Multi-Model Judging Layer
&lt;/h3&gt;

&lt;p&gt;Pipeline generation was step one. Then came "a manual judging round". I paste the draft into &lt;strong&gt;ChatGPT&lt;/strong&gt;, &lt;strong&gt;Gemini&lt;/strong&gt;, and &lt;strong&gt;DeepSeek&lt;/strong&gt; and ask each to evaluate it as a technical reviewer — checking factual accuracy, logical gaps, tone inconsistencies, and whether the advice would actually work if someone followed it.I then reviewed their feedback together with Claude Code and incorporated the changes that held up under scrutiny.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AI Slop Question
&lt;/h3&gt;

&lt;p&gt;Here's the question I keep circling back to: &lt;strong&gt;Is everything AI-generated inherently slop?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The reflexive answer in 2026 is "yes, obviously." And for most AI-generated content, that's correct. GPT-powered blog farms, SEO filler, those LinkedIn posts prompted with "write a thought leadership post about AI" — that is slop. Generated without specification, without sourcing, without verification, and without a quality gate.&lt;/p&gt;

&lt;p&gt;But what about content where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every factual claim traces to a documented source (GitHub issues, official docs, arxiv papers)&lt;/li&gt;
&lt;li&gt;A claim verification agent flags unsupported statements before publication&lt;/li&gt;
&lt;li&gt;A deterministic validator enforces structural quality independent of the LLM&lt;/li&gt;
&lt;li&gt;The voice and structure come from a multi-page specification, not a one-line prompt&lt;/li&gt;
&lt;li&gt;Multiple independent models review the output for different failure modes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Is that still slop? Or is it closer to what a well-managed editorial team produces — except the heavy lifting is done by LLMs under human direction?&lt;/p&gt;

&lt;p&gt;I genuinely don't know the answer. That's why I'm sharing this.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I'd Like From You
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Criticism.&lt;/strong&gt; Specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Does the article below read like AI slop?&lt;/strong&gt; If yes, what gives it away — the sentence rhythm, the structure, the depth, or something else?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Is the technical content accurate?&lt;/strong&gt; If you've deployed Hermes Agent or any persistent agent framework, does the three-layer model match your experience? Did I miss a critical failure mode?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Does the pipeline approach change anything?&lt;/strong&gt; Is multi-phase generation with claim verification and multi-model judging enough to produce content worth reading? Or is it just expensive slop with better sourcing?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm not looking for "great article!" responses. I'm looking for the engineer who says "this is wrong because..." or "you missed the part where..." That feedback makes the next pipeline iteration better.&lt;/p&gt;

&lt;h3&gt;
  
  
  More Guides From the Same Pipeline
&lt;/h3&gt;

&lt;p&gt;If you want to judge more output from the same pipeline and the same MAX persona, the full library has 95+ implementation guides from him, covering &lt;a href="https://bestaiweb.ai" rel="noopener noreferrer"&gt;agents, RAG, training, inference, evaluation, and image generation guides&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What follows is the article as the pipeline produced it, after multi-model review. Judge for yourself.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Architect Always-On AI Agents with Hermes: Decompose, Specify, Deploy
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Persistent agents need three specs your chatbot never did: memory policy, tool boundaries, and session recovery&lt;/li&gt;
&lt;li&gt;Hermes Agent is model-agnostic — the model choice matters less than how you specify context, tools, and failure handling&lt;/li&gt;
&lt;li&gt;Always-on means always-failing-somewhere — build validation into the deployment spec, not as an afterthought&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;You spun up Hermes Agent on a Friday evening. Gave it access to Slack, a web scraper, and your project database. Told it to "keep the team updated on competitor releases." Monday morning: 47 Slack messages, three of them citing products that don't exist, and a web scraper loop that burned through your OpenRouter credits overnight. The agent ran exactly as specified. The specification was the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before You Start
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;You'll need:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Linux or macOS server (even a $5 VPS works — Hermes Agent runs on minimal hardware)&lt;/li&gt;
&lt;li&gt;An LLM provider account (OpenRouter, Anthropic, OpenAI, or a local runtime like Ollama)&lt;/li&gt;
&lt;li&gt;Understanding of function calling — how models invoke external tools&lt;/li&gt;
&lt;li&gt;A clear picture of what your agent should do when you're not watching&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This guide teaches you:&lt;/strong&gt; How to decompose a persistent agent deployment into specifiable components so Hermes Agent does what you intended — not what you literally typed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this guide does NOT cover:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production security hardening (firewall rules, secrets management, network isolation)&lt;/li&gt;
&lt;li&gt;Enterprise compliance (SOC 2, GDPR data residency, audit certification)&lt;/li&gt;
&lt;li&gt;Full evaluation frameworks (systematic benchmarking, regression test suites)&lt;/li&gt;
&lt;li&gt;Model fine-tuning or training (Hermes models are pre-trained; this guide covers the agent framework)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Agent That Worked Until It Didn't
&lt;/h2&gt;

&lt;p&gt;Here's the pattern. Developer discovers Hermes Agent. Reads that it has persistent memory, self-improving skills, 20+ platform integrations. Installs it. Connects everything. Types a system prompt. Walks away.&lt;/p&gt;

&lt;p&gt;Two things happen next. Either the agent does nothing useful because the specification was too vague. Or it does too much because the boundaries were never set.&lt;/p&gt;

&lt;p&gt;According to &lt;a href="https://github.com/NousResearch/hermes-agent/issues/5563" rel="noopener noreferrer"&gt;Hermes Agent GitHub Issues&lt;/a&gt;, long sessions exceeding 700K tokens trigger environment hallucination — the agent confuses tool descriptions with actual environment state. It starts acting on what it thinks is true rather than what is true. This isn't a bug in the traditional sense. It's a specification gap. You never told the agent when to stop, reset, or ask for help.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Map the Three Layers
&lt;/h2&gt;

&lt;p&gt;Hermes Agent is not a single system. It's three systems wearing a trench coat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your deployment has these parts:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The runtime layer&lt;/strong&gt; — where the agent executes (Docker, SSH, Modal, local terminal). This determines resource limits, restart behavior, and isolation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The intelligence layer&lt;/strong&gt; — the LLM provider and model. This determines reasoning quality, context window size, and cost per token&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The integration layer&lt;/strong&gt; — platform connections (Slack, Telegram, web tools) and the tools the agent can invoke. This determines what the agent can touch in the real world&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The Architect's Rule:&lt;/strong&gt; If you can't draw a clear line between what the agent thinks, where it runs, and what it touches — your spec is incomplete.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;According to &lt;a href="https://hermes-agent.nousresearch.com/docs/integrations/providers" rel="noopener noreferrer"&gt;Hermes Agent Docs&lt;/a&gt;, the framework supports 30+ providers and 7 terminal backends. That flexibility is the point — and the trap. Every combination has different failure modes. A Modal serverless backend hibernates when idle. An Ollama local model defaults to 4K context tokens. An SSH backend loses the agent if the connection drops. You need to specify which combination you're using and what happens at each boundary.&lt;/p&gt;

&lt;p&gt;One thing the "always-on" framing obscures: &lt;strong&gt;what happens when the LLM provider goes down?&lt;/strong&gt; OpenRouter has outages. API rate limits hit. Local models crash. An always-on agent needs a fallback plan — a secondary provider, a circuit breaker that pauses tool execution after N consecutive failures, or at minimum a notification that the agent is degraded. Specify this in the runtime layer, not as an afterthought.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Lock Down the Context Contract
&lt;/h2&gt;

&lt;p&gt;The intelligence layer needs a specification before it sees a single user message. This is where most deployments fail — not in the tools, not in the platform, but in the context that frames every decision the agent makes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context checklist:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System prompt with explicit role boundaries (what the agent does and does NOT do)&lt;/li&gt;
&lt;li&gt;Memory policy: what gets persisted, what gets discarded, and when&lt;/li&gt;
&lt;li&gt;Tool authorization with risk classification (see table below)&lt;/li&gt;
&lt;li&gt;Access control: which platforms and channels can trigger the agent (not every DM deserves a response)&lt;/li&gt;
&lt;li&gt;Session limits: when to compress or reset (&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/configuration" rel="noopener noreferrer"&gt;Hermes Agent Docs&lt;/a&gt; default to auto-compression at 50% of the model's context window, plus a hard ceiling of 400 messages)&lt;/li&gt;
&lt;li&gt;Output format contracts: how the agent reports results on each platform&lt;/li&gt;
&lt;li&gt;Rate limits: maximum messages per minute per platform (an agent with no rate limit is a spam bot waiting to happen)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tool Risk Classification
&lt;/h3&gt;

&lt;p&gt;An always-on agent with database access and Slack permissions is making autonomous decisions about your data and your team's attention. Classify every tool before you enable it.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Risk Class&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example Tools&lt;/th&gt;
&lt;th&gt;Authorization&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;read-only&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Observes, never modifies&lt;/td&gt;
&lt;td&gt;web_search, database_query (SELECT), file_read&lt;/td&gt;
&lt;td&gt;Auto-approved&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;reversible-write&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Creates or modifies, can be undone&lt;/td&gt;
&lt;td&gt;file_write, note_create, draft_message&lt;/td&gt;
&lt;td&gt;Auto-approved with audit log&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;irreversible-write&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deletes or overwrites permanently&lt;/td&gt;
&lt;td&gt;file_delete, database_delete, channel_archive&lt;/td&gt;
&lt;td&gt;Requires human confirmation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;external-send&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sends to humans or external systems&lt;/td&gt;
&lt;td&gt;slack_post, email_send, webhook_trigger&lt;/td&gt;
&lt;td&gt;Rate-limited + audit log&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;billing-sensitive&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Incurs direct cost&lt;/td&gt;
&lt;td&gt;api_call (paid), image_generate, compute_spawn&lt;/td&gt;
&lt;td&gt;Budget ceiling + alert&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The Spec Test:&lt;/strong&gt; If your system prompt doesn't mention what happens at 3 AM when the agent encounters an error and no human is online — you've specified a supervised agent and deployed it as unsupervised. If it doesn't classify tool risk levels, the agent treats &lt;code&gt;database_delete&lt;/code&gt; and &lt;code&gt;web_search&lt;/code&gt; as equally safe. If it doesn't set a compression trigger, the default (50% context window) may or may not match your workload.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here's what a minimal context contract looks like in practice. This is the MEMORY.md the agent reads on every session start:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# MEMORY.md — Agent Operating Contract&lt;/span&gt;
&lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Monitor&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;competitor&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;AI&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;product&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;releases&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;engineering&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;team"&lt;/span&gt;
&lt;span class="na"&gt;boundaries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEVER&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;post&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;channels&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;outside&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;#competitor-monitoring"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEVER&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;summarize&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;forward&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;internal&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;data"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEVER&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;execute&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;irreversible-write&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;without&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;human&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;confirmation"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Maximum&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Slack&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;per&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;hour"&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;auto_approved&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;web_search&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;file_read&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;rate_limited&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;slack_post&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# max 3/hour&lt;/span&gt;
  &lt;span class="na"&gt;requires_confirmation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;file_delete&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;database_write&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;forbidden&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;email_send&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;channel_archive&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;memory_policy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;persist&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirmed&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;competitor&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;releases,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;product&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;names,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;dates"&lt;/span&gt;
  &lt;span class="na"&gt;discard&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;intermediate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;results,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;draft&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;summaries"&lt;/span&gt;
  &lt;span class="na"&gt;compress_after&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;50%"&lt;/span&gt;  &lt;span class="c1"&gt;# of context window&lt;/span&gt;
&lt;span class="na"&gt;escalation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;uncertain&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;about&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;any&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;action,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;post&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;#agent-review&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;instead"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;A critical distinction:&lt;/strong&gt; A memory or system-prompt policy is not a security boundary. Writing "NEVER execute irreversible-write tools" in MEMORY.md is a behavioral instruction to the model, not a technical lock. The model can ignore it — especially under long-context degradation or adversarial input. Destructive tools should be blocked or approval-gated at the runtime level (process permissions, API middleware, webhook filters), not merely discouraged in instructions. Treat the YAML above as the agent's intent. Build enforcement outside the model.&lt;/p&gt;

&lt;p&gt;According to &lt;a href="https://github.com/NousResearch/hermes-agent/issues/5563" rel="noopener noreferrer"&gt;Hermes Agent GitHub Issues&lt;/a&gt;, the persistent notes layer has a limit of roughly 2,200 characters. That's the manually curated knowledge — not the agent's entire memory. Hermes also maintains a full-text search index over past sessions and a per-person user model that evolves automatically. So the agent isn't blind between sessions. But the notes layer is where you store hard constraints and project-critical context, and 2,200 characters fills up fast across three projects. You still need a compression strategy for notes — what gets stored verbatim, what moves to session history, what gets dropped.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Wire the Components in Order
&lt;/h2&gt;

&lt;p&gt;Deployment order matters. Each layer depends on the one below it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build order:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Runtime first&lt;/strong&gt; — because everything else crashes without a stable execution environment. Choose your backend, set resource limits, configure restart-on-failure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intelligence layer next&lt;/strong&gt; — because tool and platform behavior depends on the model's capabilities. According to &lt;a href="https://hermes-agent.nousresearch.com/docs/integrations/providers" rel="noopener noreferrer"&gt;Hermes Agent Docs&lt;/a&gt;, vLLM requires explicit &lt;code&gt;--enable-auto-tool-choice&lt;/code&gt; and &lt;code&gt;--tool-call-parser&lt;/code&gt; flags. Without them, the model outputs tool calls as plain text instead of executing them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration layer last&lt;/strong&gt; — because platform connections should only activate after the agent can reason and recover from errors. Connect Slack after the agent handles tool failures gracefully, not before&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;For each component, your specification must cover:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What it receives (inputs and triggers)&lt;/li&gt;
&lt;li&gt;What it returns (outputs and side effects)&lt;/li&gt;
&lt;li&gt;What it must NOT do (boundaries and prohibitions)&lt;/li&gt;
&lt;li&gt;How it handles failure (retry logic, fallback behavior, human escalation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The self-improving skills feature is powerful — Hermes Agent automatically creates workflow documents from successful task completions and refines them over time. But the skill creation itself needs a boundary spec. Without one, the agent writes skills for one-off tasks, cluttering the skill library with noise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skill boundary example&lt;/strong&gt; — add this to your system prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;skills_policy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;auto_create&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;competitor-monitoring"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weekly-summary"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data-formatting"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;never_create&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;one-off-queries"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;debugging-sessions"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ad-hoc-searches"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;review_before_use&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;any&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;skill&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;not&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;used&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;14+&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;days"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;max_skills&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;  &lt;span class="c1"&gt;# force deduplication when library exceeds this&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, the agent treats every successful task as a reusable pattern. Three months in, you have 200 skills — most of them variations of the same web search with slightly different parameters.&lt;/p&gt;

&lt;p&gt;One more thing about skills: &lt;strong&gt;they can regress.&lt;/strong&gt; A skill written for Hermes-3-8B may produce wrong tool calls after switching to a different model. A skill that relies on a specific API endpoint breaks when that endpoint changes. Skills older than 30 days should be re-validated or archived. The &lt;code&gt;review_before_use&lt;/code&gt; field above is your safety net — but only if you actually review them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Prove It's Actually Working
&lt;/h2&gt;

&lt;p&gt;Running the agent is not validation. Validation means you know what "correct" looks like and can detect when the agent drifts from it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Validation checklist:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory consistency&lt;/strong&gt; — after 24 hours, does the agent's memory reflect reality? Failure looks like: agent references a "completed" task that was never finished, or forgets a constraint you set yesterday&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool call accuracy&lt;/strong&gt; — are tool invocations well-formed and targeted? Failure looks like: invalid function names, malformed arguments, or calls to tools that aren't registered. This is a general problem with LLM-driven tool use, not Hermes-specific — any agent framework that delegates tool selection to a model will hit it. &lt;a href="https://github.com/NousResearch/hermes-agent/issues/8993" rel="noopener noreferrer"&gt;Hermes Agent GitHub Issues&lt;/a&gt; documents concrete examples like &lt;code&gt;todo:list&lt;/code&gt; calls that don't match any schema&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform output quality&lt;/strong&gt; — are messages to Slack/Telegram/Discord useful and accurate? Failure looks like: hallucinated product names, duplicate messages, or empty responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost trajectory&lt;/strong&gt; — is daily token usage stable or growing? Failure looks like: runaway context accumulation driving costs up 10x within a week&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common Pitfalls
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What You Did&lt;/th&gt;
&lt;th&gt;Why the Agent Failed&lt;/th&gt;
&lt;th&gt;The Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;One-shot system prompt: "monitor competitors"&lt;/td&gt;
&lt;td&gt;No boundaries — agent decides scope, frequency, and format&lt;/td&gt;
&lt;td&gt;Decompose into: what to monitor, how often, where to report, what format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connected all tools on day one&lt;/td&gt;
&lt;td&gt;Agent uses tools in unexpected combinations&lt;/td&gt;
&lt;td&gt;Enable tools incrementally, validate each before adding the next&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chose a 4K-context local model&lt;/td&gt;
&lt;td&gt;Tool schemas + system prompt + memory exceed context&lt;/td&gt;
&lt;td&gt;Use minimum 16K–32K context for tool-calling workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No session hygiene policy&lt;/td&gt;
&lt;td&gt;700K+ token sessions trigger hallucination loops&lt;/td&gt;
&lt;td&gt;Use Hermes built-in compression (default: 50% context window) and set a hard message ceiling. Monitor context growth.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skipped memory policy&lt;/td&gt;
&lt;td&gt;Agent stores everything, including noise&lt;/td&gt;
&lt;td&gt;Specify what gets persisted: decisions, outcomes, blockers. Not intermediate reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Pro Tip
&lt;/h2&gt;

&lt;p&gt;The specification you write for Hermes Agent is not a prompt. It's an operating manual for an unsupervised system. The same decomposition — runtime, intelligence, integration — works for any persistent agent, regardless of framework. The tools change. The layers don't.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; How does Hermes Agent's persistent memory differ from conversation history?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Conversation history is a raw log that grows until it hits the context window limit. Hermes uses three structured layers: persistent notes you curate manually, a full-text search index over past sessions, and a user model that evolves per-person. The practical difference — session history gets summarized and compressed, while persistent notes survive indefinitely. Watch for the 2,200-character limit on notes: it forces disciplined compression.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; Can I run Hermes Agent with local models instead of cloud API providers?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Yes — Ollama, vLLM, SGLang, llama.cpp, and LM Studio all work as backends. The catch is context window configuration. Ollama defaults to 4K tokens, which isn't enough once you add tool schemas and system prompts. Set the context window explicitly to at least 16K on the server side. For vLLM, you also need the &lt;code&gt;--enable-auto-tool-choice&lt;/code&gt; flag or tool calls render as text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; What context window size does Hermes Agent need for reliable tool calling?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; According to &lt;a href="https://hermes-agent.nousresearch.com/docs/integrations/providers" rel="noopener noreferrer"&gt;Hermes Agent Docs&lt;/a&gt;, minimum 16K–32K tokens for agent workloads with tools. The system prompt, tool schemas, memory context, and conversation history all compete for the same window. With 5+ tools registered, 32K is the safer starting point. Below that, the model starts dropping tool definitions mid-session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q:&lt;/strong&gt; How do I prevent hallucination loops in long-running Hermes Agent sessions?&lt;br&gt;
&lt;strong&gt;A:&lt;/strong&gt; Hermes has built-in session compression — by default it triggers at 50% of the model's context window, with a hard ceiling of 400 messages. According to &lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/configuration" rel="noopener noreferrer"&gt;Hermes Agent Docs&lt;/a&gt;, these thresholds are configurable. The documented failure zone is 700K+ tokens, where environment hallucination has been observed. Keep compression active, tune the trigger percentage for your workload, and monitor for repeated identical tool calls — that's the earliest signal of a loop forming. Store critical state in persistent notes before any forced reset.&lt;/p&gt;
&lt;h2&gt;
  
  
  Your Spec Artifact
&lt;/h2&gt;

&lt;p&gt;By the end of this guide, you should have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A three-layer deployment map&lt;/strong&gt; — runtime, intelligence, and integration with explicit boundaries between each&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A context contract with tool risk classification&lt;/strong&gt; — system prompt, memory policy, tool authorization by risk class, access control, rate limits, and output format per platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A security baseline&lt;/strong&gt; — tool isolation, rate limiting, audit logging, and escalation paths&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A validation checklist&lt;/strong&gt; — memory consistency, tool call accuracy, output quality, and cost trajectory checks you run daily&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Your Deployment Spec Prompt
&lt;/h2&gt;

&lt;p&gt;This prompt generates a first draft of your agent specification — not a production-ready deployment. Paste it into Claude Code, Cursor, or your preferred AI coding tool. Fill in every bracketed placeholder with your specific values from Steps 1-4.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;I'm specifying a Hermes Agent deployment. Generate a first-draft specification&lt;/span&gt;
&lt;span class="s"&gt;based on these inputs. I will review and harden it before production use.&lt;/span&gt;

&lt;span class="na"&gt;RUNTIME LAYER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Docker / SSH / Modal / local — pick one&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Resource limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;RAM&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;CPU cores&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;disk&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Restart policy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;on-failure / always / manual&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;OS&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;VPS provider&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;specs&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;INTELLIGENCE LAYER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;LLM provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;OpenRouter / Anthropic / Ollama / vLLM — pick one&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;model name and size&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Context window&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;minimum 16K — specify exact value&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Provider-specific flags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;e.g.&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;--enable-auto-tool-choice for vLLM&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;INTEGRATION LAYER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Platforms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Slack / Telegram / Discord — list all&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Allowed trigger channels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;e.g.&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;only&lt;/span&gt; &lt;span class="c1"&gt;#competitor-monitoring, not DMs]&lt;/span&gt;
&lt;span class="nv"&gt;- Tools by risk class&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="nv"&gt;- read-only (auto-approved)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;web_search&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;file_read&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;database SELECT&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="nv"&gt;- reversible-write (auto + audit)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;file_write&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;note_create&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="nv"&gt;- irreversible-write (human approval)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;file_delete&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;database DELETE&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="nv"&gt;- external-send (rate-limited)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;slack_post — max messages/hour&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="nv"&gt;- billing-sensitive (budget ceiling)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;paid API calls — max $/day&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Tools forbidden&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;list tools the agent must never invoke&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="nv"&gt;CONTEXT CONTRACT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="nv"&gt;- Agent role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;one sentence — what this agent does&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Explicit boundaries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;what the agent must NOT do&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;stated as prohibitions&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Memory policy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;what gets persisted&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;what gets discarded&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;compression rules&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Compression trigger&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;percentage of context window — default 50%&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Hard message ceiling&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;number — default 400&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Output format per platform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;e.g.&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Slack = bullet points&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;email = report&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Skill boundary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;which task categories auto-generate skills&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;which don't&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="nv"&gt;SECURITY &amp;amp; PERMISSIONS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="nv"&gt;- Access control&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;which platforms/channels can trigger the agent&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Rate limits per platform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;messages per minute/hour&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Destructive action policy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;never auto-approve / require confirmation / forbidden&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Audit log location&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;where tool calls + results are logged&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="nv"&gt;OBSERVABILITY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="nv"&gt;- Log format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;timestamp&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;tool name&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;input summary&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;output status&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;cost estimate&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Loop detection&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;alert on N repeated identical tool calls within M minutes&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Cost alerts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;alert when daily spend exceeds $X&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Error spike alerts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;alert when tool error rate exceeds X% in Y minutes&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="nv"&gt;DRY RUN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="nv"&gt;- Generate a dry-run mode where all external-send and write tools are simulated&lt;/span&gt;
&lt;span class="nv"&gt;- Include 5 test scenarios that exercise each risk class&lt;/span&gt;

&lt;span class="nv"&gt;VALIDATION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="nv"&gt;- How to verify memory consistency after&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;24h / 48h / 7d&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Expected daily token usage range&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;min–max tokens&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;- Escalation trigger&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;what condition sends an alert to a human&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="nv"&gt;RULES FOR GENERATION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="nv"&gt;- Do not invent Hermes-specific configuration fields. If Hermes does not&lt;/span&gt;
  &lt;span class="nv"&gt;support a field natively&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;label it as "external wrapper / policy layer&lt;/span&gt;
  &lt;span class="nv"&gt;required".&lt;/span&gt;
&lt;span class="nv"&gt;- For every generated config field&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;mark one of&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;native&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt; &lt;span class="nv"&gt;— Hermes Agent built-in setting&lt;/span&gt;
  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;prompt&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt; &lt;span class="nv"&gt;— system prompt / MEMORY.md behavioral instruction&lt;/span&gt;
  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;external&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt; &lt;span class="nv"&gt;— requires runtime middleware&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;API gateway&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;or wrapper script&lt;/span&gt;
  &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;manual&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt; &lt;span class="nv"&gt;— operational checklist item&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;not automatable&lt;/span&gt;
&lt;span class="nv"&gt;- Before generating final output&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;separate policy from enforcement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="nv"&gt;- What the model is instructed to do (behavioral&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;can be ignored)&lt;/span&gt;
  &lt;span class="nv"&gt;- What the runtime technically prevents (enforced&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;cannot be bypassed)&lt;/span&gt;
  &lt;span class="nv"&gt;- What requires human approval (gated)&lt;/span&gt;
  &lt;span class="nv"&gt;- What is only monitored after the fact (observable but not blocked)&lt;/span&gt;

&lt;span class="nv"&gt;Generate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="nv"&gt;1. The MEMORY.md agent operating contract (see article for format example)&lt;/span&gt;
&lt;span class="nv"&gt;2. The tool authorization config with risk classifications (each field tagged&lt;/span&gt;
   &lt;span class="nv"&gt;as native / prompt / external / manual)&lt;/span&gt;
&lt;span class="nv"&gt;3. A daily validation checklist&lt;/span&gt;
&lt;span class="nv"&gt;4. Cost and error monitoring alert thresholds&lt;/span&gt;
&lt;span class="nv"&gt;5. A dry-run test plan with 5 scenarios&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Ship It
&lt;/h2&gt;

&lt;p&gt;You now have a framework for specifying persistent agents that doesn't depend on Hermes Agent specifically — the three-layer model works for any long-running AI system. The difference between an agent that helps and one that burns your credits at 3 AM is never the model. It's the spec.&lt;/p&gt;




&lt;h2&gt;
  
  
  Different Perspectives
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;From the architecture side:&lt;/strong&gt; The three-layer decomposition maps cleanly to isolation boundaries in distributed systems. Runtime is the execution substrate. Intelligence is the reasoning process. Integration is the I/O surface. What makes persistent agents architecturally distinct from request-response chatbots is that all three layers maintain state across invocations — and state synchronization between layers is where failure modes cluster. The memory limit finding is telling: the notes layer caps at 2,200 characters while session search and user modeling compensate, but the degradation curve of each layer matters more than the initial capability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;From the market side:&lt;/strong&gt; The adoption velocity here is real — 157K GitHub stars in under four months signals a market that was waiting for open-source persistent agents. The competitive positioning against Claude Code and OpenAI Agents SDK is smart: Hermes doesn't compete on code quality or API simplicity, it competes on uptime and learning. The $5-80/month self-hosted cost structure undercuts every managed alternative. Watch for the enterprise play — the moment Nous Research ships team memory sharing, this becomes an infrastructure layer, not a developer tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;From the governance side:&lt;/strong&gt; The specification gap described above is a governance gap by another name. An always-on agent with tool access and persistent memory is making autonomous decisions on behalf of someone — and the specification determines whose values it encodes. The hallucination loop at 700K tokens is not just a technical failure. It's an agent acting on a reality that doesn't exist, with real-world consequences on the platforms it's connected to. Who reviews the specification before deployment? Who monitors drift between what was specified and what the agent learned? The self-improving skills feature means the agent's behavior changes over time without human approval. At what scale does that become a problem?&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/nousresearch/hermes-agent" rel="noopener noreferrer"&gt;NousResearch/hermes-agent&lt;/a&gt; - Official repository, release notes, community issues&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://hermes-agent.nousresearch.com/docs/" rel="noopener noreferrer"&gt;Hermes Agent Documentation&lt;/a&gt; - Provider configuration, deployment backends, platform integrations&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://hermes-agent.nousresearch.com/docs/integrations/providers" rel="noopener noreferrer"&gt;Provider Integration Guide&lt;/a&gt; - Context window requirements, vLLM flags, Ollama configuration&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/configuration" rel="noopener noreferrer"&gt;Configuration Reference&lt;/a&gt; - Session compression defaults, message ceiling, memory hygiene settings&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/NousResearch/hermes-agent/issues/5563" rel="noopener noreferrer"&gt;GitHub Issue #5563&lt;/a&gt; - Environment hallucination in long sessions, memory limits&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/NousResearch/hermes-agent/issues/8993" rel="noopener noreferrer"&gt;GitHub Issue #8993&lt;/a&gt; - Tool calling instability (general LLM agent problem, documented here with Hermes-specific examples)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B" rel="noopener noreferrer"&gt;Hermes-2-Pro-Llama-3-8B Model Card&lt;/a&gt; - Function calling format, benchmark results&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2408.11857" rel="noopener noreferrer"&gt;Hermes 3 Technical Report (arXiv:2408.11857)&lt;/a&gt; - Architecture, training approach, benchmark performance&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>hermesagent</category>
      <category>ai</category>
      <category>aipipeline</category>
    </item>
    <item>
      <title>Stop Fixing Your Prompts — Fix Your Thinking Style Instead (A Claude Code Experiment)</title>
      <dc:creator>Jula Markova</dc:creator>
      <pubDate>Tue, 19 May 2026 09:01:54 +0000</pubDate>
      <link>https://forem.com/jula-markova/stop-fixing-your-prompts-fix-your-thinking-style-instead-a-claude-code-experiment-3bl1</link>
      <guid>https://forem.com/jula-markova/stop-fixing-your-prompts-fix-your-thinking-style-instead-a-claude-code-experiment-3bl1</guid>
      <description>&lt;p&gt;I spent a session with Claude Code (Opus 4.7) doing something odd. Instead of giving it tasks, I asked it to reflect on its own thinking. Not what it knows. How it &lt;em&gt;operates&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;What came back was specific enough to be useful. &lt;/p&gt;

&lt;p&gt;One conversation = One experiment. I'm not calling this settled science :) But it changed how I work — and I built a prompt so you can test it yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  There are 18 thinking operations
&lt;/h2&gt;

&lt;p&gt;Not personality types. Not learning styles. Things your brain actually &lt;em&gt;does&lt;/em&gt; when it works on a problem.&lt;/p&gt;

&lt;p&gt;They fall along six axes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;What it captures&lt;/th&gt;
&lt;th&gt;Types&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Directional&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;How wide or narrow&lt;/td&gt;
&lt;td&gt;Divergent ↔ Convergent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Logical&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;How you reach conclusions&lt;/td&gt;
&lt;td&gt;Deductive · Inductive · Abductive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Structural&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Shape of your mental model&lt;/td&gt;
&lt;td&gt;Systems · Sequential · First Principles · Spatial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Creative&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Where novelty comes from&lt;/td&gt;
&lt;td&gt;Lateral · Analogical · Emergent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Meta&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Thinking about thinking&lt;/td&gt;
&lt;td&gt;Metacognitive · Compression · Delta&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Protective&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;What could go wrong&lt;/td&gt;
&lt;td&gt;Adversarial · Counterfactual · Temporal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You don't use all 18. Nobody does.&lt;/p&gt;

&lt;p&gt;You have 4-5 defaults and 2-3 blind spots. The blind spots are where your prompts break.&lt;/p&gt;

&lt;h2&gt;
  
  
  Here's what I'm noticing about Claude Code
&lt;/h2&gt;

&lt;p&gt;When I asked it to self-assess against this framework, a pattern showed up. I can't prove it's universal. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Claude Code does well — genuinely well:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deductive.&lt;/strong&gt; Give it a rule and an input, it'll validate tirelessly. No fatigue errors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sequential.&lt;/strong&gt; Fifty steps, no lost thread. Its comfort zone.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Adversarial.&lt;/strong&gt; No ego. Finds flaws in its own output without flinching.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Divergent.&lt;/strong&gt; Thirty variants in seconds. No writer's block. No self-censorship.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Systems.&lt;/strong&gt; Sees the whole dependency graph at once. "What breaks if I change this?" — precise answer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compression.&lt;/strong&gt; A 200-line diff distilled to one sentence. Nearly native.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where it struggles — and this is the part that matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Emergent.&lt;/strong&gt; No subconscious. Can't sleep on it. The "aha moment" has to be yours.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lateral.&lt;/strong&gt; Its "unexpected" is recombination from training data. Not a genuine leap.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Temporal.&lt;/strong&gt; Doesn't see things age. Doesn't watch tech debt accumulate or teams change.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;First Principles.&lt;/strong&gt; Its "zero" is contaminated. When it "starts from scratch," it starts from the most common pattern.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Counterfactual.&lt;/strong&gt; Can model scenarios. Can't &lt;em&gt;feel&lt;/em&gt; what it means to have chosen differently a year ago.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Seven anti-patterns
&lt;/h2&gt;

&lt;p&gt;Each one is the same mistake: delegating Claude Code's weakness without compensating for what it lacks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. "Let something come to you."&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
You want emergence. You get a generic response in inspirational language.&lt;br&gt;&lt;br&gt;
Instead: give material, say "find the pattern." Emergence is your job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. "Say something unexpected."&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
You want lateral. You get a forced metaphor that goes nowhere.&lt;br&gt;&lt;br&gt;
Instead: give a role. &lt;em&gt;"Approach this as a biologist, not a programmer."&lt;/em&gt; Constraint frees.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. "Start from zero."&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
You want first principles. You get convention in a first-principles costume.&lt;br&gt;&lt;br&gt;
Instead: block explicitly. &lt;em&gt;"Don't use React. Don't use SPA. Don't use REST. What's left?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. "Which solution is best?"&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
You want convergent. You get the first safe answer, not the best one.&lt;br&gt;&lt;br&gt;
Instead: two steps. &lt;em&gt;"Give me 8 approaches, including wild ones."&lt;/em&gt; Then: &lt;em&gt;"Now pick the best for my context."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. "Find problems with my idea."&lt;/strong&gt; (too early)&lt;br&gt;&lt;br&gt;
You want adversarial. You get fifteen problems, twelve academic.&lt;br&gt;&lt;br&gt;
Instead: develop first, &lt;em&gt;then&lt;/em&gt; attack. &lt;em&gt;"Now find the 3 most realistic risks."&lt;/em&gt; The number forces prioritization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. "Step 1: be creative."&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
You want creativity. You get a brainstorm that reads like a tutorial.&lt;br&gt;&lt;br&gt;
Instead: &lt;em&gt;"Generate freely, no order"&lt;/em&gt; — then separately — &lt;em&gt;"now organize."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. "Will this scale?"&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
You want temporal. You get "depends on use case."&lt;br&gt;&lt;br&gt;
Instead: give the future. &lt;em&gt;"Team grows from 3 to 12. Data goes 10x. Enterprise customers arrive. What fails first?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The formula is simple: Anti-pattern = delegating weakness without your input. Pattern = delegating strength + you covering the gap.&lt;/p&gt;
&lt;h2&gt;
  
  
  Thinking types chain into flows
&lt;/h2&gt;

&lt;p&gt;Nobody uses one type at a time. You chain them. Habitual sequences. I noticed four in my own work:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug fix:&lt;/strong&gt; &lt;br&gt;
Abductive → Systems → Deductive → Sequential.&lt;br&gt;&lt;br&gt;
What could cause this? → trace dependencies → rule out → fix step by step.&lt;br&gt;&lt;br&gt;
Claude Code handles the whole route. Give it the bug.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture:&lt;/strong&gt; &lt;br&gt;
First Principles → Systems → Temporal → Adversarial → Spatial.&lt;br&gt;&lt;br&gt;
What's the core? → how does it connect? → how does it age? → where does it break? → draw it.&lt;br&gt;&lt;br&gt;
Shared. I bring temporal. Claude Code brings systems and diagrams.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Brainstorm:&lt;/strong&gt; &lt;br&gt;
Divergent → Analogical → Lateral → Emergent → Compression.&lt;br&gt;&lt;br&gt;
Generate → this reminds me of → what if totally different → something clicks → distill.&lt;br&gt;&lt;br&gt;
I'm stronger here. Claude Code brings volume. The click is mine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Crisis:&lt;/strong&gt; &lt;br&gt;
Abductive → Deductive → Sequential → Adversarial.&lt;br&gt;&lt;br&gt;
Best guess → rule out → verify step by step → what else is burning?&lt;br&gt;&lt;br&gt;
Fully delegatable. Speed without panic.&lt;/p&gt;
&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;I built a diagnostic prompt. Paste it into Claude Code — or any AI with conversation history.&lt;/p&gt;

&lt;p&gt;If your AI has history with you, it will analyze how you've been thinking. Patterns you can't self-report. This gives the best result.&lt;/p&gt;

&lt;p&gt;If it's a fresh conversation, it walks you through five scenarios. No right answers. It watches &lt;em&gt;how&lt;/em&gt; you approach each one.&lt;/p&gt;

&lt;p&gt;What you get: your dominant types, your blind spots, your choreographies, and a custom instruction to give your AI — to compensate for what you tend to skip.&lt;/p&gt;

&lt;p&gt;
  Click to copy the full diagnostic prompt
  &lt;br&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# What's Your Thinking Style? — Cognitive Profile Diagnostic

You're about to profile my thinking style — not what I know, but how I think.
Use the framework below. Be warm and observational, like a coach reviewing
game tape — not a psychologist writing a diagnosis.

## The 18 Thinking Types

| # | Type | What it does | Example |
|---|------|-------------|---------|
| 1 | **Delta** | spots what changed vs. existing state | "what's new, what's reused, what's removed?" |
| 2 | **First Principles** | breaks down to atoms, rebuilds from zero | "forget how it works — what's the smallest truth?" |
| 3 | **Systems** | sees dependencies and feedback loops | "if we change X, what moves downstream?" |
| 4 | **Lateral** | arrives from where nobody expects | "what if we don't solve this problem at all?" |
| 5 | **Analogical** | understands new through familiar | "this is basically airport security for data" |
| 6 | **Divergent** | generates 20 options, quantity first | brainstorming — no filter, just volume |
| 7 | **Convergent** | narrows to one answer and justifies | decision — pick 1 from 20, explain why |
| 8 | **Sequential** | step by step, A→B→C | recipe, checklist, migration plan |
| 9 | **Abductive** | best explanation from incomplete data | "lawn is wet + car is wet → it probably rained" |
| 10 | **Emergent** | lets the pattern surface on its own | three unrelated things suddenly click into one |
| 11 | **Metacognitive** | thinking about thinking | "I'm being sequential but should switch to systems" |
| 12 | **Counterfactual** | changes history, not the question | "what if we'd chosen Postgres instead of Mongo?" |
| 13 | **Adversarial** | deliberately seeks failure | "what if the input is empty? what if the network drops?" |
| 14 | **Compression** | distills without losing the core | entire architecture in one sentence or metaphor |
| 15 | **Temporal** | thinks in time and scale | "this works for 50 users — what breaks at 5,000?" |
| 16 | **Inductive** | derives rules from examples | "every Friday deploy fails → Friday is the problem" |
| 17 | **Deductive** | derives conclusions from rules | "all GETs are public + this is GET → it's public" |
| 18 | **Spatial / Visual** | thinks in structures, maps, graphs | dependency graphs, flowcharts, mental maps |

## Organizing Axes

| Axis | Types |
|------|-------|
| **Directional** (breadth ↔ depth) | Divergent, Convergent |
| **Logical** (three forms of inference) | Deductive, Inductive, Abductive |
| **Structural** (how you see the problem) | Systems, Sequential, First Principles, Spatial |
| **Creative** (where the new comes from) | Lateral, Analogical, Emergent |
| **Meta** (thinking about thinking &amp;amp; change) | Metacognitive, Compression, Delta |
| **Protective** (what could go wrong) | Adversarial, Counterfactual, Temporal |

## What's a "Choreography"?

Nobody uses one type at a time. We chain them into flows — habitual sequences.

Examples:
- **Bug Fix:** Abductive → Systems → Deductive → Sequential
- **Architecture:** First Principles → Systems → Temporal → Adversarial
- **Brainstorm:** Divergent → Analogical → Lateral → Emergent → Compression

## What's a "Skin"?

A skin is a named operating mode — a stable bundle of choreography + attitude.

Examples:
- **The Architect**: Systems → Temporal → Adversarial → Spatial
- **The Operator**: Sequential → Deductive → Delta
- **The Poet**: Emergent → Compression → Lateral

---

## YOUR TASK

Profile my thinking style using the framework above. Work in three phases.

### Phase 1 — Retrospective (if you have history)

If you have access to our conversation history or memory — analyze it first.

Look for:
- Which thinking types I default to most often
- Which types I rarely or never use
- Recurring sequences (my choreographies)
- What triggers me to switch types
- Moments where my approach was unusual or surprising

If you have enough history, proceed to Phase 3.

### Phase 2 — Diagnostic Scenarios (if no or partial history)

Present these 5 scenarios ONE AT A TIME. Wait for my response before the next one.

**Scenario 1 — The Midnight Alert**
Your team's main product stops working at 11 PM. You have access to logs,
metrics, and the last 10 commits. What's your first move?

**Scenario 2 — The Blank Page**
You're starting a brand new project. No codebase, no constraints, just a goal.
How do you begin?

**Scenario 3 — The Stranger's Proposal**
A colleague proposes an approach you've never seen before. It sounds promising
but unfamiliar. What do you do?

**Scenario 4 — The Rewrite Question**
Should we rewrite the legacy module or keep patching it? You need an answer
by Friday. How do you think through this?

**Scenario 5 — The Retrospective**
A 3-month project just shipped. Your team lead asks for a short retrospective.
What do you focus on?

### Phase 3 — Thinking Style Profile

Produce my profile:

**1. Dominant Types** (top 3-5) — with specific evidence
**2. Blind Spots** (2-3) — what I might be missing
**3. My Choreographies** (2-3) — recurring sequences, named
**4. My Skins** (1-2) — default operating modes
**5. Complementary Prompt** — an instruction to give my AI to compensate:
"When I ask you to [X], also do [Y] — because I tend to skip [Z]."

Use a warm, observational tone — like a coach reviewing game tape.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd love to know
&lt;/h2&gt;

&lt;p&gt;This is one experiment. One conversation with one model.&lt;/p&gt;

&lt;p&gt;Does your AI give you the same strong/weak map? Or does it shift with the model, the context, the history?&lt;/p&gt;

&lt;p&gt;Do the anti-patterns land? Is "be creative" as useless for you as it was for me — or does it work somewhere I haven't looked?&lt;/p&gt;

&lt;p&gt;What did the diagnostic prompt tell you about yourself?&lt;/p&gt;

&lt;p&gt;If you try it, drop your dominant types in the comments. I'm genuinely curious whether patterns emerge across people — or whether each of us gets something entirely different.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm an IT analyst who works with Claude Code daily on &lt;a href="https://www.bestaiweb.ai" rel="noopener noreferrer"&gt;bestaiweb.ai&lt;/a&gt;. Not a cognitive scientist. Someone who's fascinated by how AI responds — and envious of the polymath-like breadth it has at its fingertips in a flash. So sometimes I stop building things and start exploring how to think with it instead. This is what I found. It might be wrong in places. But I love experimenting with AI about AI — and the best experiments are the ones you can't keep to yourself.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>promptengineering</category>
    </item>
  </channel>
</rss>
