<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: John lingi</title>
    <description>The latest articles on Forem by John lingi (@john_lingi_f754bc63dd9ff1).</description>
    <link>https://forem.com/john_lingi_f754bc63dd9ff1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3844882%2F46094356-5668-40f6-bad6-5715d1cdabb4.jpg</url>
      <title>Forem: John lingi</title>
      <link>https://forem.com/john_lingi_f754bc63dd9ff1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/john_lingi_f754bc63dd9ff1"/>
    <language>en</language>
    <item>
      <title>How Compaction Works in Hermes Agent</title>
      <dc:creator>John lingi</dc:creator>
      <pubDate>Thu, 26 Mar 2026 14:44:50 +0000</pubDate>
      <link>https://forem.com/john_lingi_f754bc63dd9ff1/how-compaction-works-in-hermes-agent-2m0m</link>
      <guid>https://forem.com/john_lingi_f754bc63dd9ff1/how-compaction-works-in-hermes-agent-2m0m</guid>
      <description>&lt;p&gt;Hermes Agent Commit: &lt;strong&gt;f83c27e&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nousresearch.com/" rel="noopener noreferrer"&gt;Nous Research&lt;/a&gt; recently released &lt;a href="https://github.com/NousResearch/hermes-agent/tree/main" rel="noopener noreferrer"&gt;Hermes Agent&lt;/a&gt; — an open-source personal agent similar to OpenClaw. One of the aspects I was most curious about was context management and in particular compaction given that effective context management is arguably the most critical requirement for maximising agent performance in long-running contexts. In this post, I document Hermes’ approach to compaction: how, where and when. &lt;/p&gt;

&lt;h2&gt;
  
  
  The How
&lt;/h2&gt;

&lt;p&gt;Compaction compresses the agent’s current context into a smaller number of tokens. This is usually done out of necessity to ensure the input fits into the LLMs context or for performance as &lt;a href="https://www.notion.so/Hermes-Agent-32dd52d026fc804d97cddedcb51f0407?pvs=21" rel="noopener noreferrer"&gt;performance has been shown to decrease&lt;/a&gt; with longer contexts. In theory, there are many ways to shrink the context window from naive implementations like deleting everything or retaining the last few messages to more sophisticated pruning. Getting this right is important to ensuring the agent can effectively continue the task without a performance drop or needing to remind it of the entire context. This is precisely why I was curious about how Hermes implements compaction. Thankfully, Hermes is neatly documented and &lt;a href="https://github.com/NousResearch/hermes-agent/blob/e4033b2baf681946bc36b3c02546866a28c7aae9/agent/context_compressor.py#L546" rel="noopener noreferrer"&gt;Nous tells us&lt;/a&gt; exactly how they do it in plain English:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Compress&lt;/span&gt; &lt;span class="n"&gt;conversation&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="n"&gt;by&lt;/span&gt; &lt;span class="n"&gt;summarizing&lt;/span&gt; &lt;span class="n"&gt;middle&lt;/span&gt; &lt;span class="n"&gt;turns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;Algorithm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="mf"&gt;1.&lt;/span&gt; &lt;span class="n"&gt;Prune&lt;/span&gt; &lt;span class="n"&gt;old&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="nf"&gt;results &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cheap&lt;/span&gt; &lt;span class="n"&gt;pre&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;pass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;no&lt;/span&gt; &lt;span class="n"&gt;LLM&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="mf"&gt;2.&lt;/span&gt; &lt;span class="n"&gt;Protect&lt;/span&gt; &lt;span class="n"&gt;head&lt;/span&gt; &lt;span class="nf"&gt;messages &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;first&lt;/span&gt; &lt;span class="n"&gt;exchange&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="mf"&gt;3.&lt;/span&gt; &lt;span class="n"&gt;Find&lt;/span&gt; &lt;span class="n"&gt;tail&lt;/span&gt; &lt;span class="n"&gt;boundary&lt;/span&gt; &lt;span class="n"&gt;by&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="nf"&gt;budget &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="mf"&gt;4.&lt;/span&gt; &lt;span class="n"&gt;Summarize&lt;/span&gt; &lt;span class="n"&gt;middle&lt;/span&gt; &lt;span class="n"&gt;turns&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;structured&lt;/span&gt; &lt;span class="n"&gt;LLM&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;
  &lt;span class="mf"&gt;5.&lt;/span&gt; &lt;span class="n"&gt;On&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;compression&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iteratively&lt;/span&gt; &lt;span class="n"&gt;update&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;previous&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;

&lt;span class="n"&gt;After&lt;/span&gt; &lt;span class="n"&gt;compression&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;orphaned&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;tool_result&lt;/span&gt; &lt;span class="n"&gt;pairs&lt;/span&gt; &lt;span class="n"&gt;are&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;
&lt;span class="n"&gt;up&lt;/span&gt; &lt;span class="n"&gt;so&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;API&lt;/span&gt; &lt;span class="n"&gt;never&lt;/span&gt; &lt;span class="n"&gt;receives&lt;/span&gt; &lt;span class="n"&gt;mismatched&lt;/span&gt; &lt;span class="n"&gt;IDs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s worth describing the overall approach before diving in. Essentially, Hermes Agent chunks up the conversation history into a head, torso and tail. The head and tail are left untouched and the middle portion is summarised. This is actually the same approach OpenClaw takes. Now, how does each part work?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;em&gt;Prune old tool results&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This step is pretty ordinary — go through each old tool call and replace the result with placeholder text where ‘old’ is defined as anything in the middle portion of the context window. More precisely, only long tool results are replaced with the placeholder string &lt;code&gt;[Old tool output cleared to save context space]&lt;/code&gt; . &lt;/p&gt;

&lt;p&gt;On first glance, it wasn’t immediately obvious why pruning tool results was necessary before I read the above placeholder string. Reducing the size of the context to be compacted could positively impact compression performance. You could argue that tool results could be valuable to have for summarisation, but I guess it’s assumed that results are already sufficiently described in the agent’s conversational messages so that the results themselves aren’t actually sufficiently valuable. &lt;a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents" rel="noopener noreferrer"&gt;Anthropic also seem to do the same arguing that old tool calls aren't valuable.&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;prune_boundary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;protect_tail_count&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prune_boundary&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;_PRUNED_TOOL_PLACEHOLDER&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="c1"&gt;# Only prune if the content is substantial (&amp;gt;200 chars)
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;_PRUNED_TOOL_PLACEHOLDER&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;2. Protect head messages&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/NousResearch/hermes-agent/blob/f83c27e26f22e34b4b6337bb45608caf5a02e9c6/agent/context_compressor.py#L579" rel="noopener noreferrer"&gt;This one&lt;/a&gt; is pretty simple. There isn’t really a point in summarising the system prompt as it’s independent of the conversation and I guess first few user messages shape the entire task. The default number of messages in the head is set to 3 but the precise number can vary depending on tool call behaviour. The actual algorithm ensures that the last message in the head is not a tool result and instead the head size increases to ensure the middle region doesn’t start with an orphaned tool call or result. Not much more to say here.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;3. Protect the tail messages&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The tail is also reserved because it contains the highest signal of what the agent was most recently doing. It might be hazardous to poke holes in this and compress it into a lossy signal. One interesting design choice here is that the size of the tail is defined by the number of tokens instead of number of messages. This allows the tail to scale with the context and ensures significant summarisation can occur despite model size. Imagine if the model had small context window and the number of messages in the tail consumed a significant portion. Similar to the head, the boundary is also slightly shifted to ensure tool call blocks are grouped.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;4. Summarise the middle&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;Now we get to the heart of the algorithm. The middle portion of the message history is passed to an LLM which creates a summary based on a structured template. The template asks the LLM to preserve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The current goal&lt;/li&gt;
&lt;li&gt;Constraints and user preferences&lt;/li&gt;
&lt;li&gt;Progress towards the goal including tasks complete, in-progress and blocked&lt;/li&gt;
&lt;li&gt;Key decisions that have been made&lt;/li&gt;
&lt;li&gt;Relevant references e.g. files&lt;/li&gt;
&lt;li&gt;Next steps&lt;/li&gt;
&lt;li&gt;Other critical context e.g. config details&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The summary can change depending on if there was a previous compaction event that already created a summary. The model used in summarisation is by default the same model as the head agent but another model could also be used. Using the current model prevents problems like mismatched context window sizes or out-of-distribution errors e.g. summarisation model doesn't understand code very well. If a user was to select a model that can't handle the context then the middle portions are simply dropped. As for the risk of out-of-distribution hits, it seems higher in a system like Hermes because it is a general purpose agent that operates across a wide-variety of tasks. In practice, I’m not entirely sure how worrying ‘out-of-distribution’ really is although I’m certain summarisation quality is impacted by model selection and so like everything in AI, it’s best to rely on empirical evidence by evaluating.&lt;/p&gt;

&lt;p&gt;Finally, the output summary is returned and appended to a prefix that signals to the model a compaction event happened.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SUMMARY_PREFIX&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[CONTEXT COMPACTION] Earlier turns in this conversation were compacted &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;to save context space. The summary below describes work that was &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;already completed, and the current session state may still reflect &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;that work (for example, files may already be changed). Use the summary &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;and the current state to continue from where things left off, and &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;avoid repeating work:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;5. Assemble the compressed message&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Stitching all the pieces together requires a small amount of work. One of the aspects of this stage is to ensure that the messages alternate between ‘user’ and ‘assistant’ because this is what LLMs have been trained to expect and this is what’s required. The role of the summary message therefore is chosen based on whichever would preserve this pattern.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;last_head_role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;compress_start&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;compress_start&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;first_tail_role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;compress_end&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;compress_end&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;n_messages&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="c1"&gt;# Pick a role that avoids consecutive same-role with both neighbors.
# Priority: avoid colliding with head (already committed), then tail.
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;last_head_role&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;summary_role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;summary_role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="c1"&gt;# If the chosen role collides with the tail AND flipping wouldn't
# collide with the head, flip it.
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;summary_role&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;first_tail_role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;flipped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;summary_role&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;flipped&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;last_head_role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;summary_role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flipped&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Both roles would create consecutive same-role messages
&lt;/span&gt;        &lt;span class="c1"&gt;# (e.g. head=assistant, tail=user — neither role works).
&lt;/span&gt;        &lt;span class="c1"&gt;# Merge the summary into the first tail message instead
&lt;/span&gt;        &lt;span class="c1"&gt;# of inserting a standalone message that breaks alternation.
&lt;/span&gt;        &lt;span class="n"&gt;_merge_summary_into_tail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;_merge_summary_into_tail&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;compressed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;summary_role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There’s also a final check to ensure there are no orphaned tool calls or results as provider APIs also typically reject this. &lt;/p&gt;

&lt;p&gt;That covers the core compression algorithm. But compaction doesn't happen in isolation — there's meaningful work before and after it runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pre and Post Compaction Processing
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pre-processing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The above algorithm is compaction in isolation but depending on where the algorithm is invoked there is some pre- and post-processing.  Compaction can be triggered &lt;a href="https://github.com/NousResearch/hermes-agent/blob/36af1f3baf3f2b089ca3bd5c3b9405bdaf9689d6/cli.py#L4334" rel="noopener noreferrer"&gt;manually&lt;/a&gt; with the slash command &lt;code&gt;/compress&lt;/code&gt; or by the system in the agent loop. The manual trigger just calls the agent loop method &lt;code&gt;_compress_context&lt;/code&gt; so we’ll look at that. The method is &lt;a href="https://github.com/NousResearch/hermes-agent/blob/36af1f3baf3f2b089ca3bd5c3b9405bdaf9689d6/run_agent.py#L4819" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Before compaction, the system is prompted to &lt;a href="https://github.com/NousResearch/hermes-agent/blob/36af1f3baf3f2b089ca3bd5c3b9405bdaf9689d6/run_agent.py#L4656" rel="noopener noreferrer"&gt;extract any relevant memories&lt;/a&gt; before they are possibly lost.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt; &lt;span class="c1"&gt;# Pre-compression memory flush: let the model save memories before they're lost
&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This method actually sends a background user message to nudge an LLM to save any memories worth remembering&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;flush_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[System: The session is being compressed. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Save anything worth remembering — prioritize user preferences, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;corrections, and recurring patterns over task-specific details.]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There’s some work to prepare the message for different API providers, but essentially an LLM call is made with a single tool &lt;code&gt;memory_tool_def&lt;/code&gt; to review the entire conversation history and save any valuable memories. The tool definition is &lt;a href="https://github.com/NousResearch/hermes-agent/blob/f83c27e26f22e34b4b6337bb45608caf5a02e9c6/tools/memory_tool.py#L476" rel="noopener noreferrer"&gt;here&lt;/a&gt; and is a all-purpose memory tool which gives the LLM access to a memory store and the ability to add, replace or remove items. Any tool calls are executed and then the conversation history is cleaned up to remove the extra elements injected during the flush event. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Post-processing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once compaction runs a couple of steps are performed to re-establish the conversation for continuation. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The agent’s pending and in-progress task list is appended to the conversation. It’s interesting that this is included here because the agent has a separate TODO store that it can access and tasks are also written into the compaction summary. I can only imagine this is done to lower the odds the agent goes off-track.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="n"&gt;todo_snapshot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_todo_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_for_injection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;todo_snapshot&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;compressed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;todo_snapshot&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;The &lt;a href="https://github.com/NousResearch/hermes-agent/blob/f83c27e26f22e34b4b6337bb45608caf5a02e9c6/run_agent.py#L2255" rel="noopener noreferrer"&gt;system prompt is rebuilt&lt;/a&gt; and added to the top of the conversation history. The system prompt is comprised of the user’s memories which may have changed after the flush event before compaction. The cache is also invalidated so that the new system prompt is forced into use instead.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_invalidate_system_prompt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;new_system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_system_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_cached_system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_system_prompt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Session records are updated to reflect a compaction event has occurred and counters are reset.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Where and When
&lt;/h3&gt;

&lt;p&gt;Compaction runs either inside the agent loop or when &lt;a href="https://github.com/NousResearch/hermes-agent/blob/36af1f3baf3f2b089ca3bd5c3b9405bdaf9689d6/cli.py#L4334" rel="noopener noreferrer"&gt;manually triggered&lt;/a&gt; as a slash command &lt;code&gt;/compress&lt;/code&gt;. Inside the agent loop, compression can occur in two places (in &lt;a href="https://github.com/NousResearch/hermes-agent/blob/main/run_agent.py" rel="noopener noreferrer"&gt;run_agent.py&lt;/a&gt;):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;[Proactive] &lt;a href="https://github.com/NousResearch/hermes-agent/blob/87e2626cf6d490f03f48bf44d6d8c324bed56153/run_agent.py#L5555" rel="noopener noreferrer"&gt;Before a new user request is handled (pre-flight)&lt;/a&gt;. &lt;/li&gt;
&lt;li&gt;[Reactive] &lt;a href="https://github.com/NousResearch/hermes-agent/blob/87e2626cf6d490f03f48bf44d6d8c324bed56153/run_agent.py#L6257" rel="noopener noreferrer"&gt;During agent execution&lt;/a&gt; when context grows too large. This is triggered after received an API error e.g. receiving a 413 error code.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The pre-flight compaction is an interesting edge case. This tries to handle the case where the conversation history already exceeds a threshold number of tokens that should trigger compaction and is checked after a new user message arrives. I wasn’t quite sure why this was needed but then understood that a user can manually change the model partway through the conversation and in fact choose a model that has a smaller window compromising the context. The threshold is set by default to be 50% of the current model’s context window size. They also handle the case where multiple compactions might be necessary to reduce the current token size to fit a very small model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compression_enabled&lt;/span&gt;
    &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;context_compressor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;protect_first_n&lt;/span&gt;
                        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;context_compressor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;protect_last_n&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;_sys_tok_est&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimate_tokens_rough&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;active_system_prompt&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;_msg_tok_est&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimate_messages_tokens_rough&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;_preflight_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_sys_tok_est&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;_msg_tok_est&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_preflight_tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;context_compressor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threshold_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# May need multiple passes for very large sessions with small
&lt;/span&gt;        &lt;span class="c1"&gt;# context windows (each pass summarises the middle N turns).
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_pass&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;_orig_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;active_system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_compress_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;approx_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_preflight_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;effective_task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;_orig_len&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;  &lt;span class="c1"&gt;# Cannot compress further
&lt;/span&gt;            &lt;span class="c1"&gt;# Re-estimate after compression
&lt;/span&gt;            &lt;span class="n"&gt;_sys_tok_est&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimate_tokens_rough&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;active_system_prompt&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;_msg_tok_est&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimate_messages_tokens_rough&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;_preflight_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_sys_tok_est&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;_msg_tok_est&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_preflight_tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;context_compressor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threshold_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;  &lt;span class="c1"&gt;# Under threshold
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, there’s an upper limit to the number of times compaction can happen per turn which is by default 3 and after which the result is incomplete.&lt;/p&gt;

&lt;p&gt;And there you have it. There are likely many alternatives to compaction from approaches like rolling window strategies, more selective exfiltration methods and so forth. This approach favours simplicity and flexibility, which seems like a reasonable approach to take when building such a general purpose agent. I would love to understand what other methods might have been tested and the performance of those relative to thsi approach. If you have any thoughts on compaction or context engineering more generally, please share with me on X at &lt;a href="https://x.com/johnlingi" rel="noopener noreferrer"&gt;https://x.com/johnlingi&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;(Note, some of the code has been slightly edited to remove things like log statements that are not useful for explanation)&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>contextengineering</category>
      <category>hermes</category>
    </item>
  </channel>
</rss>
