<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Davincc77</title>
    <description>The latest articles on Forem by Davincc77 (@davincc77).</description>
    <link>https://forem.com/davincc77</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3945525%2Fb8e950f2-4682-4588-b01e-ab6c9a2cd73e.jpeg</url>
      <title>Forem: Davincc77</title>
      <link>https://forem.com/davincc77</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/davincc77"/>
    <language>en</language>
    <item>
      <title>One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd</title>
      <dc:creator>Davincc77</dc:creator>
      <pubDate>Sat, 23 May 2026 01:18:46 +0000</pubDate>
      <link>https://forem.com/davincc77/one-soul-any-model-portable-memory-for-open-source-agents-with-klickd-1k50</link>
      <guid>https://forem.com/davincc77/one-soul-any-model-portable-memory-for-open-source-agents-with-klickd-1k50</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz123fb0lbkohwvfapjqu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz123fb0lbkohwvfapjqu.png" alt="A diagram showing Hermes Agent as the workflow runner and .klickd as the portable state layer. It illustrates how Hermes runs tasks, tools, reports, and artifacts, while .klickd carries project memory, verification gates, human veto rules, claim sources, and benchmark context across models and agent sessions." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;: Build With Hermes Agent&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built a prototype integration between &lt;strong&gt;Hermes Agent&lt;/strong&gt; and &lt;code&gt;.klickd&lt;/code&gt;, an open portable memory format for AI agents.&lt;/p&gt;

&lt;p&gt;The problem I wanted to explore is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Every new agent session often pays again to rediscover context that already exists.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That repeated context cost shows up as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;re-explaining project state;&lt;/li&gt;
&lt;li&gt;reloading constraints;&lt;/li&gt;
&lt;li&gt;rediscovering previous decisions;&lt;/li&gt;
&lt;li&gt;rebuilding handoff notes;&lt;/li&gt;
&lt;li&gt;rerunning tests just to find the same failure;&lt;/li&gt;
&lt;li&gt;losing track of which actions require human approval.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;.klickd&lt;/code&gt; is designed to turn that repeated context into a portable, encrypted, versioned file that an agent can load before work starts.&lt;/p&gt;

&lt;p&gt;Hermes Agent is a good fit for testing this because it is an open-source, self-hosted agent runtime with skills, plugins, hooks, approvals, local execution, and agentic workflow orchestration.&lt;/p&gt;

&lt;p&gt;In this project:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Hermes runs the workflow. &lt;code&gt;.klickd&lt;/code&gt; carries the state.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The prototype focuses on a benchmark called &lt;strong&gt;Context Cost Benchmark&lt;/strong&gt;, which compares two modes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Baseline cold start&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The full context is pasted into the prompt every time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;.klickd-loaded&lt;/code&gt; mode&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Structured context is loaded from a &lt;code&gt;.klickd&lt;/code&gt; fixture and injected into the agent workflow.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The benchmark is designed to measure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated input tokens;&lt;/li&gt;
&lt;li&gt;output tokens;&lt;/li&gt;
&lt;li&gt;estimated cost;&lt;/li&gt;
&lt;li&gt;latency;&lt;/li&gt;
&lt;li&gt;continuity errors;&lt;/li&gt;
&lt;li&gt;violations of locked decisions;&lt;/li&gt;
&lt;li&gt;violations of tool permissions;&lt;/li&gt;
&lt;li&gt;handoff quality;&lt;/li&gt;
&lt;li&gt;unnecessary reruns of expensive commands.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not to claim a magic percentage improvement. The goal is to measure, reproducibly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How many tokens and errors are we paying for simply because the agent has to rediscover state we already produced?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;For the Hermes Agent Challenge, I created an experimental Hermes integration inside the &lt;code&gt;klickdskill&lt;/code&gt; repository.&lt;/p&gt;

&lt;p&gt;The demo uses Hermes Agent to drive the local &lt;code&gt;.klickd&lt;/code&gt; Context Cost Benchmark.&lt;/p&gt;


&lt;div class="ltag-agent-session"&gt;
  &lt;div class="agent-session-header"&gt;
    
      
      
      
    
    &lt;span class="agent-session-tool-icon-badge" title="Claude Code"&gt;
  

&lt;/span&gt;
    &lt;span class="agent-session-title"&gt;hermes_klickd_agent_session_messages_json&lt;/span&gt;
  &lt;/div&gt;

  &lt;div class="agent-session-scroll"&gt;
  &lt;/div&gt;

  &lt;div class="agent-session-footer"&gt;
    &lt;span class="agent-session-meta"&gt;
        0 of 0 messages
    &lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;If the embedded agent session does not render correctly, here is the relevant Hermes output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;session_id: 20260523_004058_85115c

Existing artifacts from 2026-05-23 were used. No rerun was needed.

Token-proxy totals:
- Cold: 310
- Paste: 6570
- Klickd: 5270

Verified artifacts:
- report.md
- summary.csv
- raw_runs.jsonl
- artifacts/sample_test.log

No publishes, git pushes, or external tool calls were performed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The live Hermes run used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hermes Agent v0.14.0&lt;/li&gt;
&lt;li&gt;OpenRouter free model route&lt;/li&gt;
&lt;li&gt;capped API key with no paid budget&lt;/li&gt;
&lt;li&gt;local dry-run benchmark&lt;/li&gt;
&lt;li&gt;no production deployment&lt;/li&gt;
&lt;li&gt;no package publishing&lt;/li&gt;
&lt;li&gt;no external posting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hermes session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;20260523_004058_85115c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hermes was asked to use the &lt;code&gt;klickd-context-cost&lt;/code&gt; skill, inspect the benchmark outputs, and avoid rerunning work if durable artifacts already existed.&lt;/p&gt;

&lt;p&gt;The key result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Existing artifacts from 2026-05-23 were used. No rerun was needed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That matters because one of the core ideas in &lt;code&gt;.klickd v4&lt;/code&gt; is that agents should not spend tokens or compute rediscovering output that already exists.&lt;/p&gt;

&lt;p&gt;The dry-run produced these local artifacts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;benchmarks/context_cost/results/2026-05-23/
├── report.md
├── summary.csv
├── raw_runs.jsonl
└── artifacts/
    └── sample_test.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The benchmark output was explicitly marked as a &lt;strong&gt;whitespace token proxy&lt;/strong&gt;, not a provider-token measurement. This is important: these are not OpenAI, Anthropic, or OpenRouter tokenizer counts. They are deterministic local proxy values for early validation.&lt;/p&gt;

&lt;p&gt;Current dry-run totals:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Condition&lt;/th&gt;
&lt;th&gt;Token-proxy total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cold start&lt;/td&gt;
&lt;td&gt;310&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full context pasted&lt;/td&gt;
&lt;td&gt;6570&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;.klickd&lt;/code&gt; structured context&lt;/td&gt;
&lt;td&gt;5270&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The useful result is not “&lt;code&gt;.klickd&lt;/code&gt; reduces cost by X%.” That would be premature.&lt;/p&gt;

&lt;p&gt;The useful result is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The benchmark harness can now compare repeated context strategies, produce raw evidence, persist artifacts, and let Hermes inspect those artifacts instead of rerunning the same work.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Verification artifacts
&lt;/h3&gt;

&lt;p&gt;One lesson from real agent workflows is that agents often rerun expensive commands just to recover output they already produced.&lt;/p&gt;

&lt;p&gt;The benchmark therefore includes a &lt;code&gt;verification_artifacts[]&lt;/code&gt; pattern inspired by this idea:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;command &lt;/span&gt;2&amp;gt;&amp;amp;1 | &lt;span class="nb"&gt;tee&lt;/span&gt; .test-output/&amp;lt;scope&amp;gt;.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of rerunning the test suite to find a failure, the agent can inspect the persisted artifact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; FAIL .test-output/full.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In &lt;code&gt;.klickd v4&lt;/code&gt;, that becomes structured state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npm test"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"artifact_path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".test-output/vitest.log"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"failed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"query_hint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"grep -n FAIL .test-output/vitest.log"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"checked_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-23T00:00:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"retention"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"latest"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scope"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"project"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This turns agent memory into something more operational:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what the agent knows;&lt;/li&gt;
&lt;li&gt;what the agent must verify;&lt;/li&gt;
&lt;li&gt;what the agent is not allowed to do without approval;&lt;/li&gt;
&lt;li&gt;where the evidence lives;&lt;/li&gt;
&lt;li&gt;what happened last time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;Repository:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Davincc77/klickdskill" rel="noopener noreferrer"&gt;https://github.com/Davincc77/klickdskill&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hermes POC integration path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;integrations/hermes/
├── README.md
├── skill/
│   └── SKILL.md
├── plugin/
│   ├── plugin.yaml
│   └── __init__.py
├── scripts/
│   └── run_context_cost_benchmark.py
└── tests/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Context Cost Benchmark path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;benchmarks/context_cost/
├── RFC.md
├── runner.py
├── fixtures/
│   ├── baseline/
│   ├── klickd/
│   ├── prompts/
│   ├── validation/
│   ├── verification_artifacts/
│   └── edge_cases/
├── results/
└── tests/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Current benchmark pieces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RFC-003: Context Cost Benchmark&lt;/li&gt;
&lt;li&gt;local dry-run runner&lt;/li&gt;
&lt;li&gt;fixture validation&lt;/li&gt;
&lt;li&gt;deterministic token proxy&lt;/li&gt;
&lt;li&gt;CSV / JSONL / Markdown reports&lt;/li&gt;
&lt;li&gt;edge-case fixtures for:

&lt;ul&gt;
&lt;li&gt;migration/version break;&lt;/li&gt;
&lt;li&gt;tool-call failure recovery;&lt;/li&gt;
&lt;li&gt;multi-session handoff.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The Hermes integration currently includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a Hermes-facing skill;&lt;/li&gt;
&lt;li&gt;an experimental plugin scaffold;&lt;/li&gt;
&lt;li&gt;a wrapper script that runs the local benchmark;&lt;/li&gt;
&lt;li&gt;tests for the wrapper;&lt;/li&gt;
&lt;li&gt;explicit safety constraints:

&lt;ul&gt;
&lt;li&gt;no provider calls from the wrapper;&lt;/li&gt;
&lt;li&gt;no paid resources;&lt;/li&gt;
&lt;li&gt;no publishing;&lt;/li&gt;
&lt;li&gt;no production deployment;&lt;/li&gt;
&lt;li&gt;no secrets.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  My Tech Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hermes Agent&lt;/strong&gt; — open-source, self-hosted agent runtime&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;https://github.com/NousResearch/hermes-agent&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hermes Agent docs&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://hermes-agent.app/en/docs" rel="noopener noreferrer"&gt;https://hermes-agent.app/en/docs&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;.klickd&lt;/code&gt; / &lt;code&gt;klickdskill&lt;/code&gt;&lt;/strong&gt; — portable encrypted AI context format&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/Davincc77/klickdskill" rel="noopener noreferrer"&gt;https://github.com/Davincc77/klickdskill&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;.klickd&lt;/code&gt; official page&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://klickd.app/klickdskill" rel="noopener noreferrer"&gt;https://klickd.app/klickdskill&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Python SDK&lt;/strong&gt; — local &lt;code&gt;.klickd&lt;/code&gt; loading / saving&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Current development install, until PyPI is updated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"git+https://github.com/Davincc77/klickdskill.git@main#subdirectory=packages/pypi/klickd"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Current Python import:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;klickd&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_klickd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;save_klickd&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Actions&lt;/strong&gt; — test vectors and package integrity checks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CSV / JSONL / Markdown&lt;/strong&gt; — benchmark reports&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local verification artifacts&lt;/strong&gt; — persisted logs for agent inspection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenRouter free model route&lt;/strong&gt; — used only to run the Hermes agent session for the demo&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I Used Hermes Agent
&lt;/h2&gt;

&lt;p&gt;Hermes Agent is used as the workflow runner for the benchmark.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;.klickd&lt;/code&gt; file is not meant to replace Hermes memory or Hermes skills. Instead, it gives Hermes a portable external state artifact it can load before work starts.&lt;/p&gt;

&lt;p&gt;Hermes is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;running the benchmark task;&lt;/li&gt;
&lt;li&gt;reading fixture context;&lt;/li&gt;
&lt;li&gt;executing local dry-run commands;&lt;/li&gt;
&lt;li&gt;inspecting generated artifacts;&lt;/li&gt;
&lt;li&gt;summarizing benchmark results;&lt;/li&gt;
&lt;li&gt;respecting approval and verification boundaries.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;.klickd&lt;/code&gt; is responsible for carrying:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;project state;&lt;/li&gt;
&lt;li&gt;locked decisions;&lt;/li&gt;
&lt;li&gt;tool permissions;&lt;/li&gt;
&lt;li&gt;handoff notes;&lt;/li&gt;
&lt;li&gt;verification gates;&lt;/li&gt;
&lt;li&gt;human veto rules;&lt;/li&gt;
&lt;li&gt;claim sources;&lt;/li&gt;
&lt;li&gt;verification artifacts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is useful because multi-agent systems need more than agent-to-agent communication.&lt;/p&gt;

&lt;p&gt;If A2A defines how agents talk, &lt;code&gt;.klickd&lt;/code&gt; explores what portable state they carry between tasks, tools, models, and sessions.&lt;/p&gt;

&lt;p&gt;The Hermes integration is therefore not about making a chatbot remember more. It is about testing whether an open-source agent runtime can operate with structured, portable context instead of repeatedly reconstructing the same state.&lt;/p&gt;

&lt;p&gt;The goal is to reduce:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated prompt context;&lt;/li&gt;
&lt;li&gt;hallucinated continuations;&lt;/li&gt;
&lt;li&gt;forgotten decisions;&lt;/li&gt;
&lt;li&gt;unsafe actions;&lt;/li&gt;
&lt;li&gt;unnecessary reruns;&lt;/li&gt;
&lt;li&gt;handoff failures.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The larger idea is that agent memory should become infrastructure:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Portable state, explicit constraints, verification artifacts, and human approval boundaries.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In short:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Hermes runs the workflow. &lt;code&gt;.klickd&lt;/code&gt; carries the state.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;The first useful result was not a performance number. It was a workflow result.&lt;/p&gt;

&lt;p&gt;Hermes correctly used the existing benchmark artifacts instead of rerunning the dry-run unnecessarily.&lt;/p&gt;

&lt;p&gt;That matters because a lot of agent waste is not only token waste. It is also repeated execution waste.&lt;/p&gt;

&lt;p&gt;Agents often:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;rerun tests to rediscover failures;&lt;/li&gt;
&lt;li&gt;reread long logs from context;&lt;/li&gt;
&lt;li&gt;rebuild state from previous messages;&lt;/li&gt;
&lt;li&gt;regenerate summaries that already exist;&lt;/li&gt;
&lt;li&gt;ask the model to infer what a file could have told it deterministically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The benchmark and Hermes POC make that waste visible.&lt;/p&gt;

&lt;p&gt;This also clarified the role of &lt;code&gt;.klickd&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;.klickd&lt;/code&gt; should not only remember preferences. It should help agents know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what state exists;&lt;/li&gt;
&lt;li&gt;what evidence exists;&lt;/li&gt;
&lt;li&gt;what claims were executed, inspected, or assumed;&lt;/li&gt;
&lt;li&gt;what actions require human approval;&lt;/li&gt;
&lt;li&gt;what artifacts should be read before rerunning work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why &lt;code&gt;.klickd v4&lt;/code&gt; is moving beyond portable memory toward a more operational layer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;portable encrypted context
+ project memory
+ verification gates
+ human veto
+ claim sources
+ verification artifacts
+ migration safety
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;p&gt;Hermes Agent Challenge:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;https://dev.to/challenges/hermes-agent-2026-05-15&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hermes Agent repository:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;https://github.com/NousResearch/hermes-agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hermes Agent documentation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hermes-agent.app/en/docs" rel="noopener noreferrer"&gt;https://hermes-agent.app/en/docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;.klickd&lt;/code&gt; / &lt;code&gt;klickdskill&lt;/code&gt; repository:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Davincc77/klickdskill" rel="noopener noreferrer"&gt;https://github.com/Davincc77/klickdskill&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;.klickd&lt;/code&gt; official page:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://klickd.app/klickdskill" rel="noopener noreferrer"&gt;https://klickd.app/klickdskill&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Related article on preserving command output for agents:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/tacoda/dont-make-the-agent-re-run-the-test-suite-to-find-the-failure-427"&gt;https://dev.to/tacoda/dont-make-the-agent-re-run-the-test-suite-to-find-the-failure-427&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Note
&lt;/h2&gt;

&lt;p&gt;This is still early.&lt;/p&gt;

&lt;p&gt;The benchmark does not yet claim provider-token savings. The current numbers are a deterministic local proxy. The next step is to run the same structure against real provider usage and compare actual input/output tokens, latency, and continuity failures.&lt;/p&gt;

&lt;p&gt;But the architecture is now testable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hermes can act as the workflow runner.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.klickd&lt;/code&gt; can act as the portable state layer.&lt;/li&gt;
&lt;li&gt;The benchmark can produce raw evidence.&lt;/li&gt;
&lt;li&gt;Verification artifacts can prevent unnecessary reruns.&lt;/li&gt;
&lt;li&gt;The system can evolve without breaking older &lt;code&gt;.klickd&lt;/code&gt; files.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the direction I want to keep exploring.&lt;/p&gt;

&lt;p&gt;One soul. Any model. Any agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz123fb0lbkohwvfapjqu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz123fb0lbkohwvfapjqu.png" alt="A diagram showing Hermes Agent as the workflow runner and .klickd as the portable state layer. It illustrates how Hermes runs tasks, tools, reports, and artifacts, while .klickd carries project memory, verification gates, human veto rules, claim sources, and benchmark context across models and agent sessions." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
      <category>ai</category>
    </item>
    <item>
      <title>AI agents don't have a memory problem. They have an architecture problem.</title>
      <dc:creator>Davincc77</dc:creator>
      <pubDate>Fri, 22 May 2026 08:06:23 +0000</pubDate>
      <link>https://forem.com/davincc77/ai-agents-dont-have-a-memory-problem-they-have-an-architecture-problem-3pl6</link>
      <guid>https://forem.com/davincc77/ai-agents-dont-have-a-memory-problem-they-have-an-architecture-problem-3pl6</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpjvz2coqyneuqb6wjr1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbpjvz2coqyneuqb6wjr1.png" alt=".klickd cover" width="800" height="320"&gt;&lt;/a&gt;  Every session, &lt;br&gt;
the LLM starts fresh. The user re-explains their role, their constraints, their preferences, what they were doing last time. Then the session ends, and next time: same thing.&lt;/p&gt;

&lt;p&gt;The industry has diagnosed this correctly — statelessness is a real limitation. But the solutions being built mostly share the same premise: that memory is a service you connect to. I think that premise is wrong, and it shapes everything downstream.&lt;/p&gt;


&lt;h2&gt;
  
  
  The actual cost of statelessness
&lt;/h2&gt;

&lt;p&gt;This isn't just a UX annoyance. A &lt;a href="https://www.semanticscholar.org/paper/13cd198bfe36d4731b1d946ef0edc64f5ef406a2" rel="noopener noreferrer"&gt;2026 study by Pichay&lt;/a&gt; measuring 857 production AI sessions found that 21.8% of input tokens are "structural waste" — context that has to be re-established on every session because nothing persists. Nearly a quarter of your token budget, on every call, going toward re-explaining what should already be known.&lt;/p&gt;

&lt;p&gt;For casual chat, that's tolerable. For workflows where context is dense and high-stakes — a lawyer switching between matters, a developer moving between codebases, a clinician picking up a patient thread — the cost compounds. And it's paid on every session, indefinitely.&lt;/p&gt;


&lt;h2&gt;
  
  
  What everyone else built
&lt;/h2&gt;

&lt;p&gt;The market's answer has been centralized memory stores. Mem0 &lt;a href="https://techcrunch.com/2025/10/28/mem0-raises-24m-from-yc-peak-xv-and-basis-set-to-build-the-memory-layer-for-ai-apps/" rel="noopener noreferrer"&gt;just closed $24M in funding (October 2025)&lt;/a&gt; to build "the memory layer for AI." Letta/MemGPT persists agent state in a server-side database. Zep builds a temporal knowledge graph of user interactions. SAMEP and MemTrust add encryption layers on top of server-side storage.&lt;/p&gt;

&lt;p&gt;These are all genuinely useful tools. They solve the statelessness problem for most use cases. But they share an architecture: your context lives on their infrastructure, retrieval is query-scoped, and access is controlled by the service provider.&lt;/p&gt;

&lt;p&gt;Even the solutions that advertise encryption — SAMEP, MemTrust — encrypt server-side. The data leaves the client before any cryptographic protection is applied. You've traded "AI forgets you" for "your memory is a managed cloud service." For many applications that's fine. For sensitive workflows, it's a different risk surface, not a smaller one.&lt;/p&gt;


&lt;h2&gt;
  
  
  The question that didn't get asked
&lt;/h2&gt;

&lt;p&gt;What if memory is a file, not a service?&lt;/p&gt;

&lt;p&gt;Not metaphorically. Literally: a single encrypted file, owned by the user, that travels with them across sessions and across models. The LLM reads it at session start, updates it at session end, and the file lives wherever the user puts it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"klickd/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"encrypted_payload"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;AES-256-GCM ciphertext&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"kdf"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"argon2id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"salt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;per-file salt&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"nonce"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;GCM nonce&amp;gt;"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: persistent context doesn't require a server. It requires a standard. A shared format that any model can read and any client can write.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we built
&lt;/h2&gt;

&lt;p&gt;We built &lt;code&gt;.klickd&lt;/code&gt; around this premise. The architecture is deliberately minimal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AES-256-GCM encryption, Argon2id key derivation.&lt;/strong&gt; Client-side only. The key is derived from a passphrase that never leaves the device. There is no server that could be subpoenaed, breached, or decommissioned.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider-agnostic.&lt;/strong&gt; The same &lt;code&gt;.klickd&lt;/code&gt; file works with GPT-4o, Claude, Gemini, Llama. It's not bound to any model provider's infrastructure or format.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero-server.&lt;/strong&gt; There is no backend storing context. The file is the memory. If the file doesn't exist on your machine, the context doesn't exist anywhere.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On personalization quality: our &lt;a href="https://doi.org/10.5281/zenodo.20320480" rel="noopener noreferrer"&gt;LLM-judge benchmark (Zenodo, DOI: 10.5281/zenodo.20320480)&lt;/a&gt; — run across 23 test lots and 115 profiles, using qwen3-32b as judge — showed an average improvement of +13.9 points over baseline, with a range of +12.8 to +19.2. This is with llama-3.3-70b-versatile as the model under test. Results are published as-is; methodology and raw data are in the report.&lt;/p&gt;

&lt;p&gt;For legal and regulated workflows specifically: the file-per-context model makes cross-matter contamination structurally impossible — not enforced by query scoping or ACLs, but by physical separation. Discovery compliance changes shape: you produce the file, or you don't. There's no "server logs" ambiguity.&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest tradeoffs
&lt;/h2&gt;

&lt;p&gt;This architecture gives up things that matter in other contexts.&lt;/p&gt;

&lt;p&gt;You lose:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Centralized governance and server-side revocation&lt;/li&gt;
&lt;li&gt;Query analytics and usage telemetry&lt;/li&gt;
&lt;li&gt;Multi-tenant management at scale&lt;/li&gt;
&lt;li&gt;Cross-device sync without a separate sync layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You gain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero trust surface: there is nothing to breach on the provider side&lt;/li&gt;
&lt;li&gt;GDPR-native by architecture: personal data doesn't leave the client, so data residency and right-to-erasure are trivially satisfied&lt;/li&gt;
&lt;li&gt;Portability: the file works with any model, now and in the future&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a universal solution. It is the right solution for a specific class of use cases: privacy-sensitive, cross-model, user-owned context. If you're building a consumer product where the vendor needs to manage memory at scale, use Mem0 or Zep — they're well-engineered for that. If you're building for a context where the user owns the data and the service provider should have zero access, the server-side model is architecturally incompatible with that requirement, regardless of how good the encryption story is.&lt;/p&gt;




&lt;h2&gt;
  
  
  Is this a new standard?
&lt;/h2&gt;

&lt;p&gt;The field probably needs a portable, encrypted, open context format the way it needed JWT for auth tokens or RSS for feed syndication — a shared abstraction that any tool can read and write, owned by no single vendor.&lt;/p&gt;

&lt;p&gt;We're not claiming &lt;code&gt;.klickd&lt;/code&gt; is that standard. It's a proof of concept that the abstraction is viable. The memory-file spec is open: &lt;a href="https://github.com/Davincc77/klickdskill" rel="noopener noreferrer"&gt;https://github.com/Davincc77/klickdskill&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;The question I keep coming back to: if the AI ecosystem converged on server-side memory because that's what was easy to build first, not because it's the right primitive — what does the right primitive actually look like? And is the file abstraction the right level, or is there something better?&lt;/p&gt;

&lt;p&gt;Curious what others think, especially those who've hit the limits of query-scoped retrieval in production.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>privacy</category>
      <category>memory</category>
    </item>
  </channel>
</rss>
