Forem: Jula Markova

How to Architect Always-On AI Agents with Hermes - Written by an AI Pipeline, Verified by Three Models. Is It Slop?

Jula Markova — Thu, 21 May 2026 13:34:52 +0000

How This Article Was Built (And Why I'm Showing You the Kitchen)

Disclaimer up front: I'm not entering the Hermes Agent challenge. I noticed the challenge and realized I could use my AI pipeline to write an article about Hermes Agent architecture. So I did. And thought, why not share both the result and the process that created it? What I actually want is your honest criticism.

Who Is The Author?

For the past several months I've been building Bestaiweb, navigating the shift from traditional development to AI. The site runs on Hugo, and the content is generated through what I call an AI content pipeline. The pipeline itself is built in TypeScript, orchestrated through Claude Code, and runs on Anthropic's Claude models. Still in progress.

That phrase — "AI content pipeline" — probably triggered your slop detector. Fair. Let me explain why I think this case is different, and then let you judge.

The Pipeline

BestAIweb currently has 450+ technical articles across 45 topic clusters. Every article goes through a multi-phase pipeline:

Market scanning — an LLM agent surveys the current tool and framework landscape for each topic, identifying what's leading, what's declining, and what's emerging
Query fan-out — the pipeline generates the questions a developer would actually search for, not the questions that sound good as headlines
Research — a dedicated research agent gathers facts, version numbers, benchmark data, and source URLs. Everything gets a structured fact sheet
Writing — here's where personas come in. The pipeline has four author personas, each with a distinct voice and content type specialization:
- MAX — the engineer. Writes step-by-step guides. Pragmatic, implementation-focused, opinionated about tool choices
- MONA — the explainer. Breaks down concepts. Thinks in diagrams and mental models
- DAN — the reporter. Covers news, market shifts, and what just shipped
- ALAN — the critic. Writes opinion pieces and ethical assessments
Claim verification — a separate agent cross-checks every factual claim against the research fact sheet. Unsupported claims get flagged
Deterministic validation — a Python script runs 30+ structural and quality checks: word count, link integrity, frontmatter completeness, source coverage
Hugo integration — the article lands in the static site with schema.org markup, generated images, and internal links

The Hermes Agent guide below was written by MAX using his guide template. His tone of voice is direct, specification-oriented, and allergic to hand-waving. The template enforces a fixed structure: prerequisites, numbered steps, pitfalls table, FAQ, and a deployable artifact at the end.

The Multi-Model Judging Layer

Pipeline generation was step one. Then came "a manual judging round". I paste the draft into ChatGPT, Gemini, and DeepSeek and ask each to evaluate it as a technical reviewer — checking factual accuracy, logical gaps, tone inconsistencies, and whether the advice would actually work if someone followed it.I then reviewed their feedback together with Claude Code and incorporated the changes that held up under scrutiny.

The AI Slop Question

Here's the question I keep circling back to: Is everything AI-generated inherently slop?

The reflexive answer in 2026 is "yes, obviously." And for most AI-generated content, that's correct. GPT-powered blog farms, SEO filler, those LinkedIn posts prompted with "write a thought leadership post about AI" — that is slop. Generated without specification, without sourcing, without verification, and without a quality gate.

But what about content where:

Every factual claim traces to a documented source (GitHub issues, official docs, arxiv papers)
A claim verification agent flags unsupported statements before publication
A deterministic validator enforces structural quality independent of the LLM
The voice and structure come from a multi-page specification, not a one-line prompt
Multiple independent models review the output for different failure modes

Is that still slop? Or is it closer to what a well-managed editorial team produces — except the heavy lifting is done by LLMs under human direction?

I genuinely don't know the answer. That's why I'm sharing this.

What I'd Like From You

Criticism. Specifically:

Does the article below read like AI slop? If yes, what gives it away — the sentence rhythm, the structure, the depth, or something else?
Is the technical content accurate? If you've deployed Hermes Agent or any persistent agent framework, does the three-layer model match your experience? Did I miss a critical failure mode?
Does the pipeline approach change anything? Is multi-phase generation with claim verification and multi-model judging enough to produce content worth reading? Or is it just expensive slop with better sourcing?

I'm not looking for "great article!" responses. I'm looking for the engineer who says "this is wrong because..." or "you missed the part where..." That feedback makes the next pipeline iteration better.

More Guides From the Same Pipeline

If you want to judge more output from the same pipeline and the same MAX persona, the full library has 95+ implementation guides from him, covering agents, RAG, training, inference, evaluation, and image generation guides.

What follows is the article as the pipeline produced it, after multi-model review. Judge for yourself.

How to Architect Always-On AI Agents with Hermes: Decompose, Specify, Deploy

TL;DR

Persistent agents need three specs your chatbot never did: memory policy, tool boundaries, and session recovery

Hermes Agent is model-agnostic — the model choice matters less than how you specify context, tools, and failure handling

Always-on means always-failing-somewhere — build validation into the deployment spec, not as an afterthought

You spun up Hermes Agent on a Friday evening. Gave it access to Slack, a web scraper, and your project database. Told it to "keep the team updated on competitor releases." Monday morning: 47 Slack messages, three of them citing products that don't exist, and a web scraper loop that burned through your OpenRouter credits overnight. The agent ran exactly as specified. The specification was the problem.

Before You Start

You'll need:

A Linux or macOS server (even a $5 VPS works — Hermes Agent runs on minimal hardware)
An LLM provider account (OpenRouter, Anthropic, OpenAI, or a local runtime like Ollama)
Understanding of function calling — how models invoke external tools
A clear picture of what your agent should do when you're not watching

This guide teaches you: How to decompose a persistent agent deployment into specifiable components so Hermes Agent does what you intended — not what you literally typed.

What this guide does NOT cover:

Production security hardening (firewall rules, secrets management, network isolation)
Enterprise compliance (SOC 2, GDPR data residency, audit certification)
Full evaluation frameworks (systematic benchmarking, regression test suites)
Model fine-tuning or training (Hermes models are pre-trained; this guide covers the agent framework)

The Agent That Worked Until It Didn't

Here's the pattern. Developer discovers Hermes Agent. Reads that it has persistent memory, self-improving skills, 20+ platform integrations. Installs it. Connects everything. Types a system prompt. Walks away.

Two things happen next. Either the agent does nothing useful because the specification was too vague. Or it does too much because the boundaries were never set.

According to Hermes Agent GitHub Issues, long sessions exceeding 700K tokens trigger environment hallucination — the agent confuses tool descriptions with actual environment state. It starts acting on what it thinks is true rather than what is true. This isn't a bug in the traditional sense. It's a specification gap. You never told the agent when to stop, reset, or ask for help.

Step 1: Map the Three Layers

Hermes Agent is not a single system. It's three systems wearing a trench coat.

Your deployment has these parts:

The runtime layer — where the agent executes (Docker, SSH, Modal, local terminal). This determines resource limits, restart behavior, and isolation
The intelligence layer — the LLM provider and model. This determines reasoning quality, context window size, and cost per token
The integration layer — platform connections (Slack, Telegram, web tools) and the tools the agent can invoke. This determines what the agent can touch in the real world

The Architect's Rule: If you can't draw a clear line between what the agent thinks, where it runs, and what it touches — your spec is incomplete.

According to Hermes Agent Docs, the framework supports 30+ providers and 7 terminal backends. That flexibility is the point — and the trap. Every combination has different failure modes. A Modal serverless backend hibernates when idle. An Ollama local model defaults to 4K context tokens. An SSH backend loses the agent if the connection drops. You need to specify which combination you're using and what happens at each boundary.

One thing the "always-on" framing obscures: what happens when the LLM provider goes down? OpenRouter has outages. API rate limits hit. Local models crash. An always-on agent needs a fallback plan — a secondary provider, a circuit breaker that pauses tool execution after N consecutive failures, or at minimum a notification that the agent is degraded. Specify this in the runtime layer, not as an afterthought.

Step 2: Lock Down the Context Contract

The intelligence layer needs a specification before it sees a single user message. This is where most deployments fail — not in the tools, not in the platform, but in the context that frames every decision the agent makes.

Context checklist:

System prompt with explicit role boundaries (what the agent does and does NOT do)
Memory policy: what gets persisted, what gets discarded, and when
Tool authorization with risk classification (see table below)
Access control: which platforms and channels can trigger the agent (not every DM deserves a response)
Session limits: when to compress or reset (Hermes Agent Docs default to auto-compression at 50% of the model's context window, plus a hard ceiling of 400 messages)
Output format contracts: how the agent reports results on each platform
Rate limits: maximum messages per minute per platform (an agent with no rate limit is a spam bot waiting to happen)

Tool Risk Classification

An always-on agent with database access and Slack permissions is making autonomous decisions about your data and your team's attention. Classify every tool before you enable it.

Risk Class	Description	Example Tools	Authorization
read-only	Observes, never modifies	web_search, database_query (SELECT), file_read	Auto-approved
reversible-write	Creates or modifies, can be undone	file_write, note_create, draft_message	Auto-approved with audit log
irreversible-write	Deletes or overwrites permanently	file_delete, database_delete, channel_archive	Requires human confirmation
external-send	Sends to humans or external systems	slack_post, email_send, webhook_trigger	Rate-limited + audit log
billing-sensitive	Incurs direct cost	api_call (paid), image_generate, compute_spawn	Budget ceiling + alert

The Spec Test: If your system prompt doesn't mention what happens at 3 AM when the agent encounters an error and no human is online — you've specified a supervised agent and deployed it as unsupervised. If it doesn't classify tool risk levels, the agent treats database_delete and web_search as equally safe. If it doesn't set a compression trigger, the default (50% context window) may or may not match your workload.

Here's what a minimal context contract looks like in practice. This is the MEMORY.md the agent reads on every session start:

# MEMORY.md — Agent Operating Contract
role: "Monitor competitor AI product releases for the engineering team"
boundaries:
  - "NEVER post to channels outside #competitor-monitoring"
  - "NEVER summarize or forward internal company data"
  - "NEVER execute irreversible-write tools without human confirmation"
  - "Maximum 3 Slack messages per hour"
tools:
  auto_approved: [web_search, file_read]
  rate_limited: [slack_post]  # max 3/hour
  requires_confirmation: [file_delete, database_write]
  forbidden: [email_send, channel_archive]
memory_policy:
  persist: "confirmed competitor releases, product names, dates"
  discard: "intermediate search results, draft summaries"
  compress_after: "50%"  # of context window
escalation: "If uncertain about any action, post to #agent-review instead"

A critical distinction: A memory or system-prompt policy is not a security boundary. Writing "NEVER execute irreversible-write tools" in MEMORY.md is a behavioral instruction to the model, not a technical lock. The model can ignore it — especially under long-context degradation or adversarial input. Destructive tools should be blocked or approval-gated at the runtime level (process permissions, API middleware, webhook filters), not merely discouraged in instructions. Treat the YAML above as the agent's intent. Build enforcement outside the model.

According to Hermes Agent GitHub Issues, the persistent notes layer has a limit of roughly 2,200 characters. That's the manually curated knowledge — not the agent's entire memory. Hermes also maintains a full-text search index over past sessions and a per-person user model that evolves automatically. So the agent isn't blind between sessions. But the notes layer is where you store hard constraints and project-critical context, and 2,200 characters fills up fast across three projects. You still need a compression strategy for notes — what gets stored verbatim, what moves to session history, what gets dropped.

Step 3: Wire the Components in Order

Deployment order matters. Each layer depends on the one below it.

Build order:

Runtime first — because everything else crashes without a stable execution environment. Choose your backend, set resource limits, configure restart-on-failure
Intelligence layer next — because tool and platform behavior depends on the model's capabilities. According to Hermes Agent Docs, vLLM requires explicit --enable-auto-tool-choice and --tool-call-parser flags. Without them, the model outputs tool calls as plain text instead of executing them
Integration layer last — because platform connections should only activate after the agent can reason and recover from errors. Connect Slack after the agent handles tool failures gracefully, not before

For each component, your specification must cover:

What it receives (inputs and triggers)
What it returns (outputs and side effects)
What it must NOT do (boundaries and prohibitions)
How it handles failure (retry logic, fallback behavior, human escalation)

The self-improving skills feature is powerful — Hermes Agent automatically creates workflow documents from successful task completions and refines them over time. But the skill creation itself needs a boundary spec. Without one, the agent writes skills for one-off tasks, cluttering the skill library with noise.

Skill boundary example — add this to your system prompt:

skills_policy:
  auto_create: ["competitor-monitoring", "weekly-summary", "data-formatting"]
  never_create: ["one-off-queries", "debugging-sessions", "ad-hoc-searches"]
  review_before_use: ["any skill not used in 14+ days"]
  max_skills: 20  # force deduplication when library exceeds this

Without this, the agent treats every successful task as a reusable pattern. Three months in, you have 200 skills — most of them variations of the same web search with slightly different parameters.

One more thing about skills: they can regress. A skill written for Hermes-3-8B may produce wrong tool calls after switching to a different model. A skill that relies on a specific API endpoint breaks when that endpoint changes. Skills older than 30 days should be re-validated or archived. The review_before_use field above is your safety net — but only if you actually review them.

Step 4: Prove It's Actually Working

Running the agent is not validation. Validation means you know what "correct" looks like and can detect when the agent drifts from it.

Validation checklist:

Memory consistency — after 24 hours, does the agent's memory reflect reality? Failure looks like: agent references a "completed" task that was never finished, or forgets a constraint you set yesterday
Tool call accuracy — are tool invocations well-formed and targeted? Failure looks like: invalid function names, malformed arguments, or calls to tools that aren't registered. This is a general problem with LLM-driven tool use, not Hermes-specific — any agent framework that delegates tool selection to a model will hit it. Hermes Agent GitHub Issues documents concrete examples like todo:list calls that don't match any schema
Platform output quality — are messages to Slack/Telegram/Discord useful and accurate? Failure looks like: hallucinated product names, duplicate messages, or empty responses
Cost trajectory — is daily token usage stable or growing? Failure looks like: runaway context accumulation driving costs up 10x within a week

Common Pitfalls

What You Did	Why the Agent Failed	The Fix
One-shot system prompt: "monitor competitors"	No boundaries — agent decides scope, frequency, and format	Decompose into: what to monitor, how often, where to report, what format
Connected all tools on day one	Agent uses tools in unexpected combinations	Enable tools incrementally, validate each before adding the next
Chose a 4K-context local model	Tool schemas + system prompt + memory exceed context	Use minimum 16K–32K context for tool-calling workloads
No session hygiene policy	700K+ token sessions trigger hallucination loops	Use Hermes built-in compression (default: 50% context window) and set a hard message ceiling. Monitor context growth.
Skipped memory policy	Agent stores everything, including noise	Specify what gets persisted: decisions, outcomes, blockers. Not intermediate reasoning

Pro Tip

The specification you write for Hermes Agent is not a prompt. It's an operating manual for an unsupervised system. The same decomposition — runtime, intelligence, integration — works for any persistent agent, regardless of framework. The tools change. The layers don't.

Frequently Asked Questions

Q: How does Hermes Agent's persistent memory differ from conversation history?
A: Conversation history is a raw log that grows until it hits the context window limit. Hermes uses three structured layers: persistent notes you curate manually, a full-text search index over past sessions, and a user model that evolves per-person. The practical difference — session history gets summarized and compressed, while persistent notes survive indefinitely. Watch for the 2,200-character limit on notes: it forces disciplined compression.

Q: Can I run Hermes Agent with local models instead of cloud API providers?
A: Yes — Ollama, vLLM, SGLang, llama.cpp, and LM Studio all work as backends. The catch is context window configuration. Ollama defaults to 4K tokens, which isn't enough once you add tool schemas and system prompts. Set the context window explicitly to at least 16K on the server side. For vLLM, you also need the --enable-auto-tool-choice flag or tool calls render as text.

Q: What context window size does Hermes Agent need for reliable tool calling?
A: According to Hermes Agent Docs, minimum 16K–32K tokens for agent workloads with tools. The system prompt, tool schemas, memory context, and conversation history all compete for the same window. With 5+ tools registered, 32K is the safer starting point. Below that, the model starts dropping tool definitions mid-session.

Q: How do I prevent hallucination loops in long-running Hermes Agent sessions?
A: Hermes has built-in session compression — by default it triggers at 50% of the model's context window, with a hard ceiling of 400 messages. According to Hermes Agent Docs, these thresholds are configurable. The documented failure zone is 700K+ tokens, where environment hallucination has been observed. Keep compression active, tune the trigger percentage for your workload, and monitor for repeated identical tool calls — that's the earliest signal of a loop forming. Store critical state in persistent notes before any forced reset.

Your Spec Artifact

By the end of this guide, you should have:

A three-layer deployment map — runtime, intelligence, and integration with explicit boundaries between each
A context contract with tool risk classification — system prompt, memory policy, tool authorization by risk class, access control, rate limits, and output format per platform
A security baseline — tool isolation, rate limiting, audit logging, and escalation paths
A validation checklist — memory consistency, tool call accuracy, output quality, and cost trajectory checks you run daily

Your Deployment Spec Prompt

This prompt generates a first draft of your agent specification — not a production-ready deployment. Paste it into Claude Code, Cursor, or your preferred AI coding tool. Fill in every bracketed placeholder with your specific values from Steps 1-4.

I'm specifying a Hermes Agent deployment. Generate a first-draft specification
based on these inputs. I will review and harden it before production use.

RUNTIME LAYER:
- Backend: [Docker / SSH / Modal / local — pick one]
- Resource limits: [RAM, CPU cores, disk]
- Restart policy: [on-failure / always / manual]
- Server: [OS, VPS provider, specs]

INTELLIGENCE LAYER:
- LLM provider: [OpenRouter / Anthropic / Ollama / vLLM — pick one]
- Model: [model name and size]
- Context window: [minimum 16K — specify exact value]
- Provider-specific flags: [e.g., --enable-auto-tool-choice for vLLM]

INTEGRATION LAYER:
- Platforms: [Slack / Telegram / Discord — list all]
- Allowed trigger channels: [e.g., only #competitor-monitoring, not DMs]
- Tools by risk class:
  - read-only (auto-approved): [web_search, file_read, database SELECT]
  - reversible-write (auto + audit): [file_write, note_create]
  - irreversible-write (human approval): [file_delete, database DELETE]
  - external-send (rate-limited): [slack_post — max messages/hour]
  - billing-sensitive (budget ceiling): [paid API calls — max $/day]
- Tools forbidden: [list tools the agent must never invoke]

CONTEXT CONTRACT:
- Agent role: [one sentence — what this agent does]
- Explicit boundaries: [what the agent must NOT do, stated as prohibitions]
- Memory policy: [what gets persisted, what gets discarded, compression rules]
- Compression trigger: [percentage of context window — default 50%]
- Hard message ceiling: [number — default 400]
- Output format per platform: [e.g., Slack = bullet points, email = report]
- Skill boundary: [which task categories auto-generate skills, which don't]

SECURITY & PERMISSIONS:
- Access control: [which platforms/channels can trigger the agent]
- Rate limits per platform: [messages per minute/hour]
- Destructive action policy: [never auto-approve / require confirmation / forbidden]
- Audit log location: [where tool calls + results are logged]

OBSERVABILITY:
- Log format: [timestamp, tool name, input summary, output status, cost estimate]
- Loop detection: [alert on N repeated identical tool calls within M minutes]
- Cost alerts: [alert when daily spend exceeds $X]
- Error spike alerts: [alert when tool error rate exceeds X% in Y minutes]

DRY RUN:
- Generate a dry-run mode where all external-send and write tools are simulated
- Include 5 test scenarios that exercise each risk class

VALIDATION:
- How to verify memory consistency after [24h / 48h / 7d]
- Expected daily token usage range: [min–max tokens]
- Escalation trigger: [what condition sends an alert to a human]

RULES FOR GENERATION:
- Do not invent Hermes-specific configuration fields. If Hermes does not
  support a field natively, label it as "external wrapper / policy layer
  required".
- For every generated config field, mark one of:
  [native] — Hermes Agent built-in setting
  [prompt] — system prompt / MEMORY.md behavioral instruction
  [external] — requires runtime middleware, API gateway, or wrapper script
  [manual] — operational checklist item, not automatable
- Before generating final output, separate policy from enforcement:
  - What the model is instructed to do (behavioral, can be ignored)
  - What the runtime technically prevents (enforced, cannot be bypassed)
  - What requires human approval (gated)
  - What is only monitored after the fact (observable but not blocked)

Generate:
1. The MEMORY.md agent operating contract (see article for format example)
2. The tool authorization config with risk classifications (each field tagged
   as native / prompt / external / manual)
3. A daily validation checklist
4. Cost and error monitoring alert thresholds
5. A dry-run test plan with 5 scenarios

Ship It

You now have a framework for specifying persistent agents that doesn't depend on Hermes Agent specifically — the three-layer model works for any long-running AI system. The difference between an agent that helps and one that burns your credits at 3 AM is never the model. It's the spec.

Different Perspectives

From the architecture side: The three-layer decomposition maps cleanly to isolation boundaries in distributed systems. Runtime is the execution substrate. Intelligence is the reasoning process. Integration is the I/O surface. What makes persistent agents architecturally distinct from request-response chatbots is that all three layers maintain state across invocations — and state synchronization between layers is where failure modes cluster. The memory limit finding is telling: the notes layer caps at 2,200 characters while session search and user modeling compensate, but the degradation curve of each layer matters more than the initial capability.

From the market side: The adoption velocity here is real — 157K GitHub stars in under four months signals a market that was waiting for open-source persistent agents. The competitive positioning against Claude Code and OpenAI Agents SDK is smart: Hermes doesn't compete on code quality or API simplicity, it competes on uptime and learning. The $5-80/month self-hosted cost structure undercuts every managed alternative. Watch for the enterprise play — the moment Nous Research ships team memory sharing, this becomes an infrastructure layer, not a developer tool.

From the governance side: The specification gap described above is a governance gap by another name. An always-on agent with tool access and persistent memory is making autonomous decisions on behalf of someone — and the specification determines whose values it encodes. The hallucination loop at 700K tokens is not just a technical failure. It's an agent acting on a reality that doesn't exist, with real-world consequences on the platforms it's connected to. Who reviews the specification before deployment? Who monitors drift between what was specified and what the agent learned? The self-improving skills feature means the agent's behavior changes over time without human approval. At what scale does that become a problem?

Sources

NousResearch/hermes-agent - Official repository, release notes, community issues
Hermes Agent Documentation - Provider configuration, deployment backends, platform integrations
Provider Integration Guide - Context window requirements, vLLM flags, Ollama configuration
Configuration Reference - Session compression defaults, message ceiling, memory hygiene settings
GitHub Issue #5563 - Environment hallucination in long sessions, memory limits
GitHub Issue #8993 - Tool calling instability (general LLM agent problem, documented here with Hermes-specific examples)
Hermes-2-Pro-Llama-3-8B Model Card - Function calling format, benchmark results
Hermes 3 Technical Report (arXiv:2408.11857) - Architecture, training approach, benchmark performance

Stop Fixing Your Prompts — Fix Your Thinking Style Instead (A Claude Code Experiment)

Jula Markova — Tue, 19 May 2026 09:01:54 +0000

I spent a session with Claude Code (Opus 4.7) doing something odd. Instead of giving it tasks, I asked it to reflect on its own thinking. Not what it knows. How it operates.

What came back was specific enough to be useful.

One conversation = One experiment. I'm not calling this settled science :) But it changed how I work — and I built a prompt so you can test it yourself.

There are 18 thinking operations

Not personality types. Not learning styles. Things your brain actually does when it works on a problem.

They fall along six axes:

Axis	What it captures	Types
Directional	How wide or narrow	Divergent ↔ Convergent
Logical	How you reach conclusions	Deductive · Inductive · Abductive
Structural	Shape of your mental model	Systems · Sequential · First Principles · Spatial
Creative	Where novelty comes from	Lateral · Analogical · Emergent
Meta	Thinking about thinking	Metacognitive · Compression · Delta
Protective	What could go wrong	Adversarial · Counterfactual · Temporal

You don't use all 18. Nobody does.

You have 4-5 defaults and 2-3 blind spots. The blind spots are where your prompts break.

Here's what I'm noticing about Claude Code

When I asked it to self-assess against this framework, a pattern showed up. I can't prove it's universal.

What Claude Code does well — genuinely well:

Deductive. Give it a rule and an input, it'll validate tirelessly. No fatigue errors.
Sequential. Fifty steps, no lost thread. Its comfort zone.
Adversarial. No ego. Finds flaws in its own output without flinching.
Divergent. Thirty variants in seconds. No writer's block. No self-censorship.
Systems. Sees the whole dependency graph at once. "What breaks if I change this?" — precise answer.
Compression. A 200-line diff distilled to one sentence. Nearly native.

Where it struggles — and this is the part that matters:

Emergent. No subconscious. Can't sleep on it. The "aha moment" has to be yours.
Lateral. Its "unexpected" is recombination from training data. Not a genuine leap.
Temporal. Doesn't see things age. Doesn't watch tech debt accumulate or teams change.
First Principles. Its "zero" is contaminated. When it "starts from scratch," it starts from the most common pattern.
Counterfactual. Can model scenarios. Can't feel what it means to have chosen differently a year ago.

Seven anti-patterns

Each one is the same mistake: delegating Claude Code's weakness without compensating for what it lacks.

1. "Let something come to you."

You want emergence. You get a generic response in inspirational language.

Instead: give material, say "find the pattern." Emergence is your job.

2. "Say something unexpected."

You want lateral. You get a forced metaphor that goes nowhere.

Instead: give a role. "Approach this as a biologist, not a programmer." Constraint frees.

3. "Start from zero."

You want first principles. You get convention in a first-principles costume.

Instead: block explicitly. "Don't use React. Don't use SPA. Don't use REST. What's left?"

4. "Which solution is best?"

You want convergent. You get the first safe answer, not the best one.

Instead: two steps. "Give me 8 approaches, including wild ones." Then: "Now pick the best for my context."

5. "Find problems with my idea." (too early)

You want adversarial. You get fifteen problems, twelve academic.

Instead: develop first, then attack. "Now find the 3 most realistic risks." The number forces prioritization.

6. "Step 1: be creative."

You want creativity. You get a brainstorm that reads like a tutorial.

Instead: "Generate freely, no order" — then separately — "now organize."

7. "Will this scale?"

You want temporal. You get "depends on use case."

Instead: give the future. "Team grows from 3 to 12. Data goes 10x. Enterprise customers arrive. What fails first?"

The formula is simple: Anti-pattern = delegating weakness without your input. Pattern = delegating strength + you covering the gap.

Thinking types chain into flows

Nobody uses one type at a time. You chain them. Habitual sequences. I noticed four in my own work:

Bug fix:
Abductive → Systems → Deductive → Sequential.

What could cause this? → trace dependencies → rule out → fix step by step.

Claude Code handles the whole route. Give it the bug.

Architecture:
First Principles → Systems → Temporal → Adversarial → Spatial.

What's the core? → how does it connect? → how does it age? → where does it break? → draw it.

Shared. I bring temporal. Claude Code brings systems and diagrams.

Brainstorm:
Divergent → Analogical → Lateral → Emergent → Compression.

Generate → this reminds me of → what if totally different → something clicks → distill.

I'm stronger here. Claude Code brings volume. The click is mine.

Crisis:
Abductive → Deductive → Sequential → Adversarial.

Best guess → rule out → verify step by step → what else is burning?

Fully delegatable. Speed without panic.

Try it yourself

I built a diagnostic prompt. Paste it into Claude Code — or any AI with conversation history.

If your AI has history with you, it will analyze how you've been thinking. Patterns you can't self-report. This gives the best result.

If it's a fresh conversation, it walks you through five scenarios. No right answers. It watches how you approach each one.

What you get: your dominant types, your blind spots, your choreographies, and a custom instruction to give your AI — to compensate for what you tend to skip.

Click to copy the full diagnostic prompt

# What's Your Thinking Style? — Cognitive Profile Diagnostic

You're about to profile my thinking style — not what I know, but how I think.
Use the framework below. Be warm and observational, like a coach reviewing
game tape — not a psychologist writing a diagnosis.

## The 18 Thinking Types

| # | Type | What it does | Example |
|---|------|-------------|---------|
| 1 | **Delta** | spots what changed vs. existing state | "what's new, what's reused, what's removed?" |
| 2 | **First Principles** | breaks down to atoms, rebuilds from zero | "forget how it works — what's the smallest truth?" |
| 3 | **Systems** | sees dependencies and feedback loops | "if we change X, what moves downstream?" |
| 4 | **Lateral** | arrives from where nobody expects | "what if we don't solve this problem at all?" |
| 5 | **Analogical** | understands new through familiar | "this is basically airport security for data" |
| 6 | **Divergent** | generates 20 options, quantity first | brainstorming — no filter, just volume |
| 7 | **Convergent** | narrows to one answer and justifies | decision — pick 1 from 20, explain why |
| 8 | **Sequential** | step by step, A→B→C | recipe, checklist, migration plan |
| 9 | **Abductive** | best explanation from incomplete data | "lawn is wet + car is wet → it probably rained" |
| 10 | **Emergent** | lets the pattern surface on its own | three unrelated things suddenly click into one |
| 11 | **Metacognitive** | thinking about thinking | "I'm being sequential but should switch to systems" |
| 12 | **Counterfactual** | changes history, not the question | "what if we'd chosen Postgres instead of Mongo?" |
| 13 | **Adversarial** | deliberately seeks failure | "what if the input is empty? what if the network drops?" |
| 14 | **Compression** | distills without losing the core | entire architecture in one sentence or metaphor |
| 15 | **Temporal** | thinks in time and scale | "this works for 50 users — what breaks at 5,000?" |
| 16 | **Inductive** | derives rules from examples | "every Friday deploy fails → Friday is the problem" |
| 17 | **Deductive** | derives conclusions from rules | "all GETs are public + this is GET → it's public" |
| 18 | **Spatial / Visual** | thinks in structures, maps, graphs | dependency graphs, flowcharts, mental maps |

## Organizing Axes

| Axis | Types |
|------|-------|
| **Directional** (breadth ↔ depth) | Divergent, Convergent |
| **Logical** (three forms of inference) | Deductive, Inductive, Abductive |
| **Structural** (how you see the problem) | Systems, Sequential, First Principles, Spatial |
| **Creative** (where the new comes from) | Lateral, Analogical, Emergent |
| **Meta** (thinking about thinking & change) | Metacognitive, Compression, Delta |
| **Protective** (what could go wrong) | Adversarial, Counterfactual, Temporal |

## What's a "Choreography"?

Nobody uses one type at a time. We chain them into flows — habitual sequences.

Examples:
- **Bug Fix:** Abductive → Systems → Deductive → Sequential
- **Architecture:** First Principles → Systems → Temporal → Adversarial
- **Brainstorm:** Divergent → Analogical → Lateral → Emergent → Compression

## What's a "Skin"?

A skin is a named operating mode — a stable bundle of choreography + attitude.

Examples:
- **The Architect**: Systems → Temporal → Adversarial → Spatial
- **The Operator**: Sequential → Deductive → Delta
- **The Poet**: Emergent → Compression → Lateral

---

## YOUR TASK

Profile my thinking style using the framework above. Work in three phases.

### Phase 1 — Retrospective (if you have history)

If you have access to our conversation history or memory — analyze it first.

Look for:
- Which thinking types I default to most often
- Which types I rarely or never use
- Recurring sequences (my choreographies)
- What triggers me to switch types
- Moments where my approach was unusual or surprising

If you have enough history, proceed to Phase 3.

### Phase 2 — Diagnostic Scenarios (if no or partial history)

Present these 5 scenarios ONE AT A TIME. Wait for my response before the next one.

**Scenario 1 — The Midnight Alert**
Your team's main product stops working at 11 PM. You have access to logs,
metrics, and the last 10 commits. What's your first move?

**Scenario 2 — The Blank Page**
You're starting a brand new project. No codebase, no constraints, just a goal.
How do you begin?

**Scenario 3 — The Stranger's Proposal**
A colleague proposes an approach you've never seen before. It sounds promising
but unfamiliar. What do you do?

**Scenario 4 — The Rewrite Question**
Should we rewrite the legacy module or keep patching it? You need an answer
by Friday. How do you think through this?

**Scenario 5 — The Retrospective**
A 3-month project just shipped. Your team lead asks for a short retrospective.
What do you focus on?

### Phase 3 — Thinking Style Profile

Produce my profile:

**1. Dominant Types** (top 3-5) — with specific evidence
**2. Blind Spots** (2-3) — what I might be missing
**3. My Choreographies** (2-3) — recurring sequences, named
**4. My Skins** (1-2) — default operating modes
**5. Complementary Prompt** — an instruction to give my AI to compensate:
"When I ask you to [X], also do [Y] — because I tend to skip [Z]."

Use a warm, observational tone — like a coach reviewing game tape.

What I'd love to know

This is one experiment. One conversation with one model.

Does your AI give you the same strong/weak map? Or does it shift with the model, the context, the history?

Do the anti-patterns land? Is "be creative" as useless for you as it was for me — or does it work somewhere I haven't looked?

What did the diagnostic prompt tell you about yourself?

If you try it, drop your dominant types in the comments. I'm genuinely curious whether patterns emerge across people — or whether each of us gets something entirely different.

I'm an IT analyst who works with Claude Code daily on bestaiweb.ai. Not a cognitive scientist. Someone who's fascinated by how AI responds — and envious of the polymath-like breadth it has at its fingertips in a flash. So sometimes I stop building things and start exploring how to think with it instead. This is what I found. It might be wrong in places. But I love experimenting with AI about AI — and the best experiments are the ones you can't keep to yourself.