<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: pponali</title>
    <description>The latest articles on Forem by pponali (@pponali).</description>
    <link>https://forem.com/pponali</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1238066%2F98aafa2c-caaf-4488-9c96-f143321b4acb.png</url>
      <title>Forem: pponali</title>
      <link>https://forem.com/pponali</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/pponali"/>
    <language>en</language>
    <item>
      <title>Circuit Breakers Under Stress: Anatomy of a Payment Cascade</title>
      <dc:creator>pponali</dc:creator>
      <pubDate>Sun, 10 May 2026 14:10:46 +0000</pubDate>
      <link>https://forem.com/pponali/circuit-breakers-under-stress-anatomy-of-a-payment-cascade-hn0</link>
      <guid>https://forem.com/pponali/circuit-breakers-under-stress-anatomy-of-a-payment-cascade-hn0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh9zz6gvcc6b1m0n03nv1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh9zz6gvcc6b1m0n03nv1.png" alt=" " width="800" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A flash sale hit us at 10x baseline RPS. Within four minutes, our Payment Service circuit breaker tripped to &lt;strong&gt;OPEN&lt;/strong&gt;, error rate climbed to 92%, and p99 latency on the payment path went from 200ms to 14.2 seconds. Here's the part nobody tells you on the conference circuit: the circuit breaker didn't fail. It worked exactly as designed. The failure was everywhere else.&lt;/p&gt;

&lt;p&gt;This is a postmortem of what we saw, why Resilience4j's defaults weren't enough, and the four changes that made the next sale boring.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;Standard Java microservices stack. Spring Cloud Gateway in front, JWT auth via Keycloak, Resilience4j wrapping every outbound call. Payment Service synchronously calls Stripe. Order Service synchronously calls Payment. PostgreSQL for orders, Redis for circuit breaker state, Kafka for the dead-letter queue.&lt;/p&gt;

&lt;p&gt;Six services. Five circuit breakers. One very stressed thread pool.&lt;/p&gt;

&lt;h2&gt;
  
  
  What 10x RPS actually does
&lt;/h2&gt;

&lt;p&gt;Baseline was around 1,000 RPS. The flash sale pushed us to 10,243. The edge layer absorbed it fine — NGINX did its job, the rate limiter degraded gracefully, the CDN cached anything cacheable. Spring Cloud Gateway routed cleanly.&lt;/p&gt;

&lt;p&gt;The wheels came off at the Payment Service. Stripe's p99 latency under load climbed from a healthy 800ms to 14.2 seconds. That doesn't sound catastrophic until you do the math: every Payment thread now holds for ~14s instead of &amp;lt;1s. With a fixed thread pool, throughput collapses long before the breaker notices.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# What we had — Resilience4j defaults, lightly tuned&lt;/span&gt;
&lt;span class="na"&gt;resilience4j.circuitbreaker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;instances&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;paymentService&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;failureRateThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
      &lt;span class="na"&gt;slidingWindowSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
      &lt;span class="na"&gt;slidingWindowType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;COUNT_BASED&lt;/span&gt;
      &lt;span class="na"&gt;waitDurationInOpenState&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt;
      &lt;span class="na"&gt;permittedNumberOfCallsInHalfOpenState&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A 50% failure threshold over 100 calls means the breaker waits for 50 failures before tripping. At 10x load with timeouts, that's roughly four minutes of users staring at spinners. By the time the breaker opened, the thread pool was already 98% saturated.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cascade, step by step
&lt;/h2&gt;

&lt;p&gt;The order matters:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Flash-sale spike hits the gateway at 10x RPS.&lt;/li&gt;
&lt;li&gt;Order Service synchronously calls Payment for every checkout.&lt;/li&gt;
&lt;li&gt;Stripe's p99 spikes to 14s under provider-side load.&lt;/li&gt;
&lt;li&gt;Payment Service threads block on those timeouts.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;failureRateThreshold=50%&lt;/code&gt; breached → Payment CB transitions to &lt;strong&gt;OPEN&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Subsequent calls fail-fast → fallback handler enqueues "deferred order" responses to Kafka.&lt;/li&gt;
&lt;li&gt;Order Service's own CB drops to &lt;strong&gt;HALF-OPEN&lt;/strong&gt;, probing with limited concurrency.&lt;/li&gt;
&lt;li&gt;Bulkhead isolation prevents the cascade from reaching Inventory, Notifications, or User services.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step 8 is the only reason this incident wasn't a full-platform outage. Without per-endpoint bulkheads, a slow Stripe would have eaten every thread in the gateway's pool, and User Service login requests would have queued behind dead Payment calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  The state machine, practically
&lt;/h2&gt;

&lt;p&gt;If you've only read the docs, the circuit breaker looks like a tidy three-state diagram. In production it's noisier:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Resilience4j state transitions, simplified&lt;/span&gt;
&lt;span class="nc"&gt;CircuitBreaker&lt;/span&gt; &lt;span class="n"&gt;cb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreaker&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"paymentService"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;cb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getEventPublisher&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;onStateTransition&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;warn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"CB {} : {} -&amp;gt; {}"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getCircuitBreakerName&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
          &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getStateTransition&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;getFromState&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
          &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getStateTransition&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;getToState&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
      &lt;span class="n"&gt;meterRegistry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;counter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"cb.transition"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
          &lt;span class="s"&gt;"name"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getCircuitBreakerName&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
          &lt;span class="s"&gt;"to"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getStateTransition&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;getToState&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
      &lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;increment&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="o"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That listener saved us during the postmortem. We could replay exactly when each breaker tripped, when probing started, and which trial calls failed. If you don't emit metrics on every state transition, you're flying blind.&lt;/p&gt;

&lt;p&gt;The HALF-OPEN state is the dangerous one. Resilience4j permits a small number of trial calls; if any of them fail, you slam back to OPEN for another &lt;code&gt;waitDuration&lt;/code&gt;. Set the trial pool too low and you'll never recover; set it too high and you'll hammer a still-broken downstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four changes that fixed it
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Tighter, faster breakers
&lt;/h3&gt;

&lt;p&gt;We dropped the threshold and shrunk the window:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resilience4j.circuitbreaker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;instances&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;paymentService&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;failureRateThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;          &lt;span class="c1"&gt;# was 50&lt;/span&gt;
      &lt;span class="na"&gt;slowCallRateThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;          &lt;span class="c1"&gt;# NEW — slow calls also count&lt;/span&gt;
      &lt;span class="na"&gt;slowCallDurationThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2s&lt;/span&gt;      &lt;span class="c1"&gt;# NEW&lt;/span&gt;
      &lt;span class="na"&gt;slidingWindowSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;              &lt;span class="c1"&gt;# was 100&lt;/span&gt;
      &lt;span class="na"&gt;minimumNumberOfCalls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;waitDurationInOpenState&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;       &lt;span class="c1"&gt;# was 30s&lt;/span&gt;
      &lt;span class="na"&gt;permittedNumberOfCallsInHalfOpenState&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two non-obvious knobs matter here. &lt;code&gt;slowCallRateThreshold&lt;/code&gt; lets you trip on latency, not just errors — critical when a downstream is dying slowly rather than 500-ing. And the smaller window means the breaker reacts in seconds, not minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Per-endpoint bulkheads
&lt;/h3&gt;

&lt;p&gt;A single thread pool for "Payment Service" is too coarse. Split by downstream:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolBulkhead&lt;/span&gt; &lt;span class="nf"&gt;stripeBulkhead&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;ThreadPoolBulkheadConfig&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolBulkheadConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;custom&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxThreadPoolSize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;coreThreadPoolSize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queueCapacity&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;keepAliveDuration&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolBulkhead&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"stripe"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolBulkhead&lt;/span&gt; &lt;span class="nf"&gt;fraudBulkhead&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Smaller — fraud is allowed to be slow, not allowed to starve payment&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolBulkhead&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"fraud"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;ThreadPoolBulkheadConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;custom&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxThreadPoolSize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;coreThreadPoolSize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now a slow fraud engine can't drain Stripe's threads, and vice versa. Bulkhead-per-dependency is more YAML, but it's the only way to guarantee isolation when one downstream misbehaves.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Async outbox + Kafka retry
&lt;/h3&gt;

&lt;p&gt;The synchronous &lt;code&gt;Order → Payment → Stripe&lt;/code&gt; chain was the real sin. We moved Payment to an outbox pattern: orders write a payment intent to Postgres in the same transaction, a relay publishes to Kafka, and a worker calls Stripe asynchronously. The user gets an immediate "order placed" response; the charge happens within seconds, with retries handled by the consumer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Transactional&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Order&lt;/span&gt; &lt;span class="nf"&gt;placeOrder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OrderRequest&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Order&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;orderRepo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;save&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Order&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="n"&gt;outboxRepo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;save&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OutboxEvent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"payment.charge.requested"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;objectMapper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;writeValueAsString&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;payment&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// returns in &amp;lt;50ms regardless of Stripe latency&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Decoupling time-of-order from time-of-charge means a 14-second Stripe doesn't translate to a 14-second user experience. It also gives us natural retry and dead-lettering through Kafka, instead of bolting retry logic onto every caller.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. HPA on RPS and queue depth
&lt;/h3&gt;

&lt;p&gt;The Payment Service was scaled on CPU, which is useless when threads are blocked on I/O. We swapped to a custom Prometheus metric — RPS plus Kafka consumer lag — and let the HPA add pods when the queue grew faster than it drained. CPU never crossed 40% during the incident; if we'd been watching the right signal, we'd have scaled out three minutes earlier.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd tell past me
&lt;/h2&gt;

&lt;p&gt;The circuit breaker is a fire alarm, not a fire suppression system. By the time it trips, you've already had a fire for a while. The real defenses are the things that stop the fire from starting: bulkhead isolation per downstream, slow-call detection, async boundaries on anything you don't fully control, and autoscaling on signals that actually correlate with load.&lt;/p&gt;

&lt;p&gt;Resilience4j is excellent. The defaults are not your friend in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;If you take three things from this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trip on latency, not just errors.&lt;/strong&gt; &lt;code&gt;slowCallRateThreshold&lt;/code&gt; is the most underused knob in Resilience4j.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One bulkhead per downstream, always.&lt;/strong&gt; Coarse pools will betray you the moment two dependencies fail differently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synchronous chains across third-party APIs are tech debt.&lt;/strong&gt; An outbox + queue is more code, but it's the difference between a postmortem and an incident report.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The next flash sale ran 12x baseline. Payment p99 stayed under 600ms. Nobody paged.&lt;/p&gt;

</description>
      <category>java</category>
      <category>microservices</category>
      <category>devops</category>
      <category>architecture</category>
    </item>
    <item>
      <title>After the Skill Vault: 3 More Hidden Token Sinks in Claude Code</title>
      <dc:creator>pponali</dc:creator>
      <pubDate>Fri, 08 May 2026 00:48:13 +0000</pubDate>
      <link>https://forem.com/pponali/after-the-skill-vault-3-more-hidden-token-sinks-in-claude-code-32ek</link>
      <guid>https://forem.com/pponali/after-the-skill-vault-3-more-hidden-token-sinks-in-claude-code-32ek</guid>
      <description>&lt;p&gt;If you read my earlier post on the &lt;a href="https://dev.to/pponali/how-i-cut-claude-code-token-consumption-by-96-with-the-skill-vault-pattern-9d1"&gt;Skill Vault pattern&lt;/a&gt;, you know I cut Claude Code's per-session overhead by 96%. After living with it for a few weeks, I went looking for what was &lt;em&gt;still&lt;/em&gt; eating tokens — and found three more sinks worth killing.&lt;/p&gt;

&lt;p&gt;This is the follow-up: smaller wins individually, but together they shaved another &lt;strong&gt;~5,000 tokens off every single session&lt;/strong&gt; with zero capability lost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Tokens Were Still Hiding
&lt;/h2&gt;

&lt;p&gt;After the vault, my baseline was around 51K tokens per session. I dug into the system prompt to see what remained:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Skill list (still): &lt;strong&gt;~3K tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Project &lt;code&gt;CLAUDE.md&lt;/code&gt;: &lt;strong&gt;~2K tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;claude-mem auto-injected timeline: &lt;strong&gt;~2K tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Plugin hook reminders: &lt;strong&gt;~1K tokens&lt;/strong&gt; (some recurring per turn)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three of these were either unnecessary or way overweight. Here's how I killed each.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sink #1: A Bloated Root CLAUDE.md (~1.5K tokens)
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; files auto-load into every session that touches the repo. Mine had grown to &lt;strong&gt;326 lines / 8 KB&lt;/strong&gt; with quickstart instructions for every component:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flutter dev commands&lt;/li&gt;
&lt;li&gt;Backend &lt;code&gt;npm&lt;/code&gt; scripts&lt;/li&gt;
&lt;li&gt;React build steps&lt;/li&gt;
&lt;li&gt;ML service Python setup&lt;/li&gt;
&lt;li&gt;A duplicate gstack skill listing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem: I was loading &lt;strong&gt;every component's instructions on every turn&lt;/strong&gt;, even when I was only working in one of them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix: Hierarchical CLAUDE.md Files
&lt;/h3&gt;

&lt;p&gt;Claude Code loads &lt;code&gt;CLAUDE.md&lt;/code&gt; &lt;strong&gt;hierarchically&lt;/strong&gt; — only the ones in your current working tree get pulled in. So instead of one fat root file, I split it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;khetisahayak/
├── CLAUDE.md                    # 55 lines — overview, ports, creds only
├── kheti_sahayak_app/CLAUDE.md  # Flutter details
├── frontend/CLAUDE.md           # React details
├── ml/CLAUDE.md                 # ML service details
└── kheti_sahayak_backend/CLAUDE.md  # already existed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The root now contains only:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Project overview (5 lines)&lt;/li&gt;
&lt;li&gt;Service ports table&lt;/li&gt;
&lt;li&gt;Test credentials&lt;/li&gt;
&lt;li&gt;Cross-cutting auth + DB notes&lt;/li&gt;
&lt;li&gt;Troubleshooting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Component-specific details only load when I'm actually working in that subdir.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; root CLAUDE.md trimmed from 326 → 55 lines (8 KB → 1.6 KB). &lt;strong&gt;~1.5K tokens saved per session.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Sink #2: Plugin SessionStart Hooks Injecting "Helpful" Context
&lt;/h2&gt;

&lt;p&gt;I use &lt;a href="https://github.com/thedotmack/claude-mem" rel="noopener noreferrer"&gt;claude-mem&lt;/a&gt; for persistent memory across sessions. Genuinely useful. But it has a &lt;code&gt;SessionStart&lt;/code&gt; hook that auto-injects a &lt;strong&gt;timeline of recent observations&lt;/strong&gt; at the top of every conversation — about 50 entries, ~2K tokens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;S499 Indeed Auto-Apply — User asked how to automate job applications
S498 Indeed MCP Integration Query — clarifying JobSpy vs Apify
2337 12:33p ✅ systemd/install.sh — Backend Services Enabled
... 47 more lines
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I almost never &lt;em&gt;needed&lt;/em&gt; this auto-recall. When I want past context, I call &lt;code&gt;mem-search&lt;/code&gt; explicitly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix: Disable the Auto-Inject, Keep the Memory
&lt;/h3&gt;

&lt;p&gt;The hook config lives at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.claude/plugins/cache/thedotmack/claude-mem/&amp;lt;version&amp;gt;/hooks/hooks.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;SessionStart&lt;/code&gt; array has three hooks. The third is the timeline injection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"... node bun-runner.js worker-service.cjs hook claude-code context ..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I removed just that one entry. The other two SessionStart hooks (install + worker-start) and the recording hooks (&lt;code&gt;PostToolUse&lt;/code&gt;, &lt;code&gt;Stop&lt;/code&gt;, &lt;code&gt;SessionEnd&lt;/code&gt;) stay intact, so memory is still being captured. The MCP search server still works — I just have to ask for it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Always back up before editing plugin internals&lt;/span&gt;
&lt;span class="nb"&gt;cp &lt;/span&gt;hooks.json hooks.json.bak
&lt;span class="c"&gt;# Remove the 3rd SessionStart hook (jq or manual edit)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; ~2K tokens saved per session. Memory still works on demand.&lt;/p&gt;

&lt;p&gt;⚠️ Caveat: editing a file in &lt;code&gt;~/.claude/plugins/cache/&lt;/code&gt; will be overwritten on plugin upgrade. For durability, mirror the change in your user-level &lt;code&gt;~/.claude/settings.json&lt;/code&gt; hooks block, or add a small post-upgrade re-patch script.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sink #3: Round 2 of the Skill Vault (~1.4K tokens)
&lt;/h2&gt;

&lt;p&gt;The original vault was a one-time bulk move. After several weeks of actual usage, I saw which skills I'd installed and &lt;strong&gt;never touched&lt;/strong&gt;. The vault was overdue for a second pass.&lt;/p&gt;

&lt;p&gt;Audit script (same as the first post, run again):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;f &lt;span class="k"&gt;in&lt;/span&gt; ~/.claude/skills/&lt;span class="k"&gt;*&lt;/span&gt;/SKILL.md&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;dirname&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$f&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;basename&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$dir&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &amp;lt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$f&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$size&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-rn&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-30&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I ended up vaulting &lt;strong&gt;27 more skills&lt;/strong&gt; out of 73 actively loaded:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;8 marketing skills&lt;/strong&gt; (&lt;code&gt;mkt-content&lt;/code&gt;, &lt;code&gt;mkt-seo&lt;/code&gt;, &lt;code&gt;mkt-social&lt;/code&gt;, etc.) — I do real marketing in a separate context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;6 research skills&lt;/strong&gt; (&lt;code&gt;research&lt;/code&gt;, &lt;code&gt;research-deep&lt;/code&gt;, &lt;code&gt;research-report&lt;/code&gt;, etc.) — episodic, not daily&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 niche tools&lt;/strong&gt; (&lt;code&gt;obsidian-vault&lt;/code&gt;, &lt;code&gt;make-pdf&lt;/code&gt;, &lt;code&gt;pair-agent&lt;/code&gt;, &lt;code&gt;setup-browser-cookies&lt;/code&gt;, &lt;code&gt;open-gstack-browser&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4 design-heavy&lt;/strong&gt; (&lt;code&gt;design-consultation&lt;/code&gt;, &lt;code&gt;design-html&lt;/code&gt;, &lt;code&gt;design-shotgun&lt;/code&gt;, &lt;code&gt;devex-review&lt;/code&gt;) — restored only when designing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 plan reviews&lt;/strong&gt; (&lt;code&gt;plan-design-review&lt;/code&gt;, &lt;code&gt;plan-devex-review&lt;/code&gt;, &lt;code&gt;plan-tune&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1 interview prep&lt;/strong&gt; (&lt;code&gt;staff-engineer-interview&lt;/code&gt;) — used a few times last quarter, not weekly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bulk move:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;s &lt;span class="k"&gt;in &lt;/span&gt;mkt-content mkt-email mkt-growth mkt-pr mkt-review mkt-seo mkt-social cmo &lt;span class="se"&gt;\&lt;/span&gt;
         research research-add-fields research-add-items research-deep research-report edit-article &lt;span class="se"&gt;\&lt;/span&gt;
         staff-engineer-interview &lt;span class="se"&gt;\&lt;/span&gt;
         obsidian-vault make-pdf pair-agent setup-browser-cookies open-gstack-browser &lt;span class="se"&gt;\&lt;/span&gt;
         plan-design-review plan-devex-review plan-tune &lt;span class="se"&gt;\&lt;/span&gt;
         design-consultation design-html design-shotgun devex-review&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;mv&lt;/span&gt; ~/.claude/skills/&lt;span class="nv"&gt;$s&lt;/span&gt; ~/.claude/skills-vault/ 2&amp;gt;/dev/null
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;skill-vault&lt;/code&gt; index skill from the original post still bridges everything — Claude knows where each skill lives and restores it on demand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; 73 → 46 active skills. &lt;strong&gt;~1.4K tokens saved.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Combined Result
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;th&gt;Tokens Saved (per session)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sink #1 — CLAUDE.md trim + per-component split&lt;/td&gt;
&lt;td&gt;~1.5K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sink #2 — claude-mem timeline disabled&lt;/td&gt;
&lt;td&gt;~2K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sink #3 — Round 2 skill vault&lt;/td&gt;
&lt;td&gt;~1.4K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~4.9K&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On top of the original 96% reduction from the Skill Vault, this is another solid bite. But honestly, the dollar value isn't the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters Beyond Cost
&lt;/h2&gt;

&lt;p&gt;Every token in your context is a token Claude has to &lt;strong&gt;attend over&lt;/strong&gt; before generating its response. The bigger your prompt, the more diluted attention becomes on the actual task.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Big context ≠ better answers.&lt;/strong&gt; Frequently it's the opposite. Targeted context wins.&lt;/p&gt;

&lt;p&gt;The pattern across all three fixes is the same:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit what's auto-loaded vs. what's actually useful.&lt;/strong&gt; You'll be surprised.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Move episodic content out of the always-on path&lt;/strong&gt; and into on-demand access.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust the model to pull what it needs&lt;/strong&gt; — when it does need the vaulted skill, the subdir's CLAUDE.md, or the memory search, it will reach for it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Less ambient noise. Sharper signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tips for Your Own Audit
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Inspect, don't guess.&lt;/strong&gt; &lt;code&gt;wc -c&lt;/code&gt; your &lt;code&gt;CLAUDE.md&lt;/code&gt; files. &lt;code&gt;ls ~/.claude/skills/&lt;/code&gt;. Read your plugin &lt;code&gt;hooks.json&lt;/code&gt; files line by line.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hierarchy is free.&lt;/strong&gt; Per-directory &lt;code&gt;CLAUDE.md&lt;/code&gt; files cost nothing when you're not in that directory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugin hooks are debt.&lt;/strong&gt; Every &lt;code&gt;SessionStart&lt;/code&gt; or &lt;code&gt;UserPromptSubmit&lt;/code&gt; hook is a tax. Audit them by hand. Some are essential (auth, telemetry); some inject "context" that's just clutter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-vault every few weeks.&lt;/strong&gt; Usage patterns shift. Skills that were daily three months ago may be quarterly now. Yesterday's must-have is tomorrow's vault candidate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch for per-turn taxes.&lt;/strong&gt; A &lt;code&gt;UserPromptSubmit&lt;/code&gt; hook costs N tokens &lt;em&gt;every single turn&lt;/em&gt;. Even a small reminder block adds up fast in a long session.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Skill Vault pattern is still the heaviest hitter. These three follow-ups are the long tail:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CLAUDE.md&lt;/strong&gt; → split per component, slim the root.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugin hooks&lt;/strong&gt; → audit auto-injected context. Disable what you don't need.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skill vault&lt;/strong&gt; → revisit it. Vault more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together: another &lt;strong&gt;~5K tokens per session, zero capability lost.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When your context is tight, your model is sharp.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Prakash Ponali, a Staff Engineer with 16+ years in enterprise eCommerce. Currently building &lt;a href="https://khetisahayak.com" rel="noopener noreferrer"&gt;Khetisahayak&lt;/a&gt; — a farming helper app for Telugu-speaking farmers in Andhra Pradesh. Find me on &lt;a href="https://www.linkedin.com/in/prakash-ponali-75ab9b17" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>claudecode</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How I Cut Claude Code Token Consumption by 96% with the Skill Vault Pattern</title>
      <dc:creator>pponali</dc:creator>
      <pubDate>Sun, 19 Apr 2026 07:45:04 +0000</pubDate>
      <link>https://forem.com/pponali/how-i-cut-claude-code-token-consumption-by-96-with-the-skill-vault-pattern-9d1</link>
      <guid>https://forem.com/pponali/how-i-cut-claude-code-token-consumption-by-96-with-the-skill-vault-pattern-9d1</guid>
      <description>&lt;h2&gt;
  
  
  The Problem: 181 Skills Burning 1.18 Million Tokens Per Session
&lt;/h2&gt;

&lt;p&gt;I'm a power user of &lt;a href="https://claude.ai/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; — Anthropic's CLI for AI-assisted development. Over weeks of installing skill packs from GitHub repos, I accumulated &lt;strong&gt;181 skills&lt;/strong&gt; from multiple sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;gstack&lt;/strong&gt; agents (QA, design, deploy, monitoring)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;marketingskills&lt;/strong&gt; by Corey Haines (36 CRO/SEO/copywriting skills)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;wondelai/skills&lt;/strong&gt; (42 product/engineering frameworks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;superpowers&lt;/strong&gt; by obra (14 dev methodology skills)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;claude-mem&lt;/strong&gt; (persistent memory plugin)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deep-Research-skills&lt;/strong&gt; (structured research workflows)&lt;/li&gt;
&lt;li&gt;Custom skills I built myself&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sounds great, right? More skills = more capability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong.&lt;/strong&gt; Every new conversation, Claude Code injects ALL skill descriptions into the system prompt. With 181 skills, that's approximately &lt;strong&gt;1.18 million tokens of overhead per session&lt;/strong&gt; — before I even type my first message.&lt;/p&gt;

&lt;p&gt;Tokens were burning like a wildfire. Every task was expensive. Context windows were filling up fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Discovery: Measuring the Actual Cost
&lt;/h2&gt;

&lt;p&gt;I ran a simple audit to see exactly what was happening:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Count total SKILL.md bytes across all skills&lt;/span&gt;
&lt;span class="nv"&gt;total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
&lt;span class="k"&gt;for &lt;/span&gt;f &lt;span class="k"&gt;in&lt;/span&gt; ~/.claude/skills/&lt;span class="k"&gt;*&lt;/span&gt;/SKILL.md&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &amp;lt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$f&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; 2&amp;gt;/dev/null&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$((&lt;/span&gt;total &lt;span class="o"&gt;+&lt;/span&gt; size&lt;span class="k"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;done
&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Total bytes: &lt;/span&gt;&lt;span class="nv"&gt;$total&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Approx tokens: &lt;/span&gt;&lt;span class="k"&gt;$((&lt;/span&gt;total &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="k"&gt;))&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: &lt;strong&gt;4.7MB of SKILL.md files = ~1.18 million tokens.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The top offenders were massive:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ship (gstack)&lt;/td&gt;
&lt;td&gt;130KB&lt;/td&gt;
&lt;td&gt;~32K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;plan-ceo-review&lt;/td&gt;
&lt;td&gt;112KB&lt;/td&gt;
&lt;td&gt;~28K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;office-hours&lt;/td&gt;
&lt;td&gt;101KB&lt;/td&gt;
&lt;td&gt;~25K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;seedance-15-real-estate&lt;/td&gt;
&lt;td&gt;95KB&lt;/td&gt;
&lt;td&gt;~24K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;plan-devex-review&lt;/td&gt;
&lt;td&gt;93KB&lt;/td&gt;
&lt;td&gt;~23K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;15 Seedance video skills alone consumed &lt;strong&gt;~265K tokens&lt;/strong&gt;. I used them maybe once a month.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: The Skill Vault Pattern
&lt;/h2&gt;

&lt;p&gt;The idea is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Move rarely-used skills to a vault directory&lt;/strong&gt; (out of auto-load)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep a lightweight index skill&lt;/strong&gt; that tells Claude what's available&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude restores skills on-demand&lt;/strong&gt; when your request matches one&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optionally move it back&lt;/strong&gt; after the task&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 1: Create the Vault
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.claude/skills-vault
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Audit Your Skills by Usage Frequency
&lt;/h3&gt;

&lt;p&gt;Sort all skills by file size to find the biggest offenders:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;f &lt;span class="k"&gt;in&lt;/span&gt; ~/.claude/skills/&lt;span class="k"&gt;*&lt;/span&gt;/SKILL.md&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;dirname&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$f&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;basename&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$dir&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &amp;lt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$f&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$size&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-rn&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-30&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then categorize them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tier 1 (Daily use):&lt;/strong&gt; Keep active — your core dev workflow skills&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier 2 (Project-specific):&lt;/strong&gt; Keep active — skills tied to current projects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier 3 (Occasional):&lt;/strong&gt; Vault — heavy skills used weekly/monthly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier 4 (Rare):&lt;/strong&gt; Vault — frameworks and reference skills&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 3: Move Skills to the Vault
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Move video generation skills (rarely used)&lt;/span&gt;
&lt;span class="nb"&gt;mv&lt;/span&gt; ~/.claude/skills/seedance-&lt;span class="k"&gt;*&lt;/span&gt; ~/.claude/skills-vault/

&lt;span class="c"&gt;# Move heavy plan review skills&lt;/span&gt;
&lt;span class="nb"&gt;mv&lt;/span&gt; ~/.claude/skills/&lt;span class="o"&gt;{&lt;/span&gt;plan-ceo-review,plan-eng-review,plan-design-review&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   ~/.claude/skills-vault/

&lt;span class="c"&gt;# Move framework/reference skills&lt;/span&gt;
&lt;span class="nb"&gt;mv&lt;/span&gt; ~/.claude/skills/&lt;span class="o"&gt;{&lt;/span&gt;clean-architecture,domain-driven-design,system-design&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   ~/.claude/skills-vault/

&lt;span class="c"&gt;# Move marketing skills you don't use daily&lt;/span&gt;
&lt;span class="nb"&gt;mv&lt;/span&gt; ~/.claude/skills/&lt;span class="o"&gt;{&lt;/span&gt;copywriting,seo-audit,paid-ads,cold-email&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   ~/.claude/skills-vault/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Create the Index Skill (The Key Ingredient)
&lt;/h3&gt;

&lt;p&gt;This is what makes the pattern work. Create a lightweight skill that acts as a lookup table for Claude:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.claude/skills/skill-vault
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create &lt;code&gt;~/.claude/skills/skill-vault/SKILL.md&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;skill-vault&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Index of vaulted skills at ~/.claude/skills-vault/.&lt;/span&gt;
  &lt;span class="s"&gt;When the user's request matches a vaulted skill, restore it with&lt;/span&gt;
  &lt;span class="s"&gt;mv ~/.claude/skills-vault/&amp;lt;name&amp;gt; ~/.claude/skills/ then use it.&lt;/span&gt;
  &lt;span class="s"&gt;Use when user asks about design, SEO, CRO, copywriting,&lt;/span&gt;
  &lt;span class="s"&gt;architecture patterns, security audit, deployment, video&lt;/span&gt;
  &lt;span class="s"&gt;generation, or any topic not covered by active skills.&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## Skill Vault&lt;/span&gt;

Skills stored in &lt;span class="sb"&gt;`~/.claude/skills-vault/`&lt;/span&gt; to save tokens.
When a user request matches one, restore it:

mv ~/.claude/skills-vault/&lt;span class="nt"&gt;&amp;lt;skill-name&amp;gt;&lt;/span&gt; ~/.claude/skills/

Then invoke it normally. After the task, optionally move back:

mv ~/.claude/skills/&lt;span class="nt"&gt;&amp;lt;skill-name&amp;gt;&lt;/span&gt; ~/.claude/skills-vault/

&lt;span class="gu"&gt;## Vaulted Skills Index&lt;/span&gt;

&lt;span class="gu"&gt;### Design &amp;amp; UI&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`design-review`&lt;/span&gt; — Visual QA, spacing/hierarchy fixes
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`design-html`&lt;/span&gt; — Production HTML/CSS from mockups
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`design-shotgun`&lt;/span&gt; — Generate multiple design variants
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`top-design`&lt;/span&gt; — Awwwards-quality web experiences

&lt;span class="gu"&gt;### Engineering Frameworks&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`clean-architecture`&lt;/span&gt; — Dependency rule, ports/adapters
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`domain-driven-design`&lt;/span&gt; — Bounded contexts, aggregates
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`system-design`&lt;/span&gt; — Distributed systems, scaling
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`refactoring-patterns`&lt;/span&gt; — Extract method, code smells

&lt;span class="gu"&gt;### DevOps &amp;amp; Monitoring&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`land-and-deploy`&lt;/span&gt; — Merge, CI, verify production
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`canary`&lt;/span&gt; — Post-deploy monitoring
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`cso`&lt;/span&gt; — Security audit (OWASP, STRIDE)

&lt;span class="gu"&gt;### Marketing&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`copywriting`&lt;/span&gt; — Marketing copy for any page
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`seo-audit`&lt;/span&gt; — Technical SEO audit
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`paid-ads`&lt;/span&gt; — Google/Meta/LinkedIn campaigns

(... add all your vaulted skills here ...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This index costs only &lt;strong&gt;~1.5K tokens&lt;/strong&gt; but gives Claude awareness of all 139 vaulted skills.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Active skills&lt;/td&gt;
&lt;td&gt;181&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;td&gt;76% fewer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokens per session&lt;/td&gt;
&lt;td&gt;~1.18M&lt;/td&gt;
&lt;td&gt;~51K&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;96%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vaulted (on-demand)&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;139&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Capability lost&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;96% token reduction with zero capability loss.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works at Runtime
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: "audit my design for visual issues"

Claude thinks:
  1. No active skill matches "design audit"
  2. skill-vault index matches: design-review
  3. Run: mv ~/.claude/skills-vault/design-review ~/.claude/skills/
  4. Now use design-review skill normally
  5. Task complete
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude reads the vault index, finds the right skill, restores it with a single &lt;code&gt;mv&lt;/code&gt; command, and proceeds normally. The user experience is seamless.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Final Active Skill Set (43 Skills)
&lt;/h2&gt;

&lt;p&gt;Here's what I kept always-active:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core Dev Workflow (Superpowers):&lt;/strong&gt;&lt;br&gt;
ship, qa, browse, review, investigate, writing-plans, executing-plans, brainstorming, subagent-driven-development, systematic-debugging, test-driven-development, verification-before-completion&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project-Specific:&lt;/strong&gt;&lt;br&gt;
ecommerce-architect, staff-engineer-interview, claude-api-patterns, claude-code-mastery, mcp-server-development&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Marketing (Khetisahayak):&lt;/strong&gt;&lt;br&gt;
cmo, mkt-content, mkt-seo, mkt-social, mkt-growth, mkt-email, mkt-pr, mkt-review&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Research:&lt;/strong&gt;&lt;br&gt;
research, research-deep, research-report&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Vault Index:&lt;/strong&gt;&lt;br&gt;
skill-vault (the 1.5K token index that knows about 139 other skills)&lt;/p&gt;
&lt;h2&gt;
  
  
  Tips for Your Own Vault
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Audit first.&lt;/strong&gt; Run the size audit script before moving anything. You might be surprised which skills are the heaviest.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Keep your daily drivers active.&lt;/strong&gt; Don't vault skills you use every session. The restore step adds a small delay.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Group by category.&lt;/strong&gt; Makes it easy to restore a whole category:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;mv&lt;/span&gt; ~/.claude/skills-vault/seedance-&lt;span class="k"&gt;*&lt;/span&gt; ~/.claude/skills/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Update the index&lt;/strong&gt; when you add new skills to the vault.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Restore everything&lt;/strong&gt; if you need full power for a complex session:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;mv&lt;/span&gt; ~/.claude/skills-vault/&lt;span class="k"&gt;*&lt;/span&gt; ~/.claude/skills/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Where to Find Great Skills
&lt;/h2&gt;

&lt;p&gt;Here are the skill repos I installed from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/coreyhaines31/marketingskills" rel="noopener noreferrer"&gt;marketingskills&lt;/a&gt; — 36 marketing skills by Corey Haines&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/wondelai/skills" rel="noopener noreferrer"&gt;wondelai/skills&lt;/a&gt; — 42 product/engineering framework skills
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/obra/superpowers" rel="noopener noreferrer"&gt;obra/superpowers&lt;/a&gt; — Dev methodology (TDD, planning, subagents)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Weizhena/Deep-Research-skills" rel="noopener noreferrer"&gt;Deep-Research-skills&lt;/a&gt; — Structured research workflows&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/thedotmack/claude-mem" rel="noopener noreferrer"&gt;claude-mem&lt;/a&gt; — Persistent memory across sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Installing is simple — symlink into &lt;code&gt;~/.claude/skills/&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/coreyhaines31/marketingskills.git
&lt;span class="k"&gt;for &lt;/span&gt;skill_dir &lt;span class="k"&gt;in &lt;/span&gt;marketingskills/skills/&lt;span class="k"&gt;*&lt;/span&gt;/&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;$skill_dir&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; ~/.claude/skills/&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;basename&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$skill_dir&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then vault what you don't need daily.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Skill Vault pattern is dead simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;~/.claude/skills/&lt;/code&gt;&lt;/strong&gt; = active skills (loaded every turn)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;~/.claude/skills-vault/&lt;/code&gt;&lt;/strong&gt; = dormant skills (restored on demand)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;skill-vault/SKILL.md&lt;/code&gt;&lt;/strong&gt; = lightweight index that bridges the two&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're running more than 50 skills in Claude Code, you're probably burning hundreds of thousands of unnecessary tokens per session. The vault pattern gives you the full arsenal without the cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;96% fewer tokens. Zero lost capability. One &lt;code&gt;mv&lt;/code&gt; command away.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Prakash Ponali, a Staff Engineer with 16+ years in enterprise eCommerce. Currently building &lt;a href="https://khetisahayak.com" rel="noopener noreferrer"&gt;Khetisahayak&lt;/a&gt; — a farming helper app for Telugu-speaking farmers in Andhra Pradesh. Find me on &lt;a href="https://www.linkedin.com/in/prakash-ponali-75ab9b17" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Web Topic Analysis Report - 2025-07-19</title>
      <dc:creator>pponali</dc:creator>
      <pubDate>Sat, 19 Jul 2025 14:53:45 +0000</pubDate>
      <link>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-29fh</link>
      <guid>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-29fh</guid>
      <description>&lt;h1&gt;
  
  
  Web Topic Analysis Report - 2025-07-19
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;By AI Content Generator on July 19, 2025&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;web_analysis&lt;/code&gt;, &lt;code&gt;tech_trends&lt;/code&gt;, &lt;code&gt;ai_analysis&lt;/code&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Web Topic Analysis Report
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Workflow ID:&lt;/strong&gt; web_topic_analysis_20250719_145337&lt;br&gt;
&lt;strong&gt;Generated:&lt;/strong&gt; 2025-07-19T14:53:37.277540&lt;/p&gt;

&lt;p&gt;No analysis results available.&lt;/p&gt;

</description>
      <category>webanalysis</category>
      <category>techtrends</category>
      <category>aianalysis</category>
    </item>
    <item>
      <title>Web Topic Analysis Report - 2025-07-19</title>
      <dc:creator>pponali</dc:creator>
      <pubDate>Sat, 19 Jul 2025 14:34:12 +0000</pubDate>
      <link>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-1e7</link>
      <guid>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-1e7</guid>
      <description>&lt;h1&gt;
  
  
  Web Topic Analysis Report - 2025-07-19
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;By AI Content Generator on July 19, 2025&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;web_analysis&lt;/code&gt;, &lt;code&gt;tech_trends&lt;/code&gt;, &lt;code&gt;ai_analysis&lt;/code&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Web Topic Analysis Report
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Workflow ID:&lt;/strong&gt; web_topic_analysis_20250719_143346&lt;br&gt;
&lt;strong&gt;Generated:&lt;/strong&gt; 2025-07-19T14:33:46.400427&lt;/p&gt;

&lt;p&gt;No analysis results available.&lt;/p&gt;

</description>
      <category>webanalysis</category>
      <category>techtrends</category>
      <category>aianalysis</category>
    </item>
    <item>
      <title>Web Topic Analysis Report - 2025-07-19</title>
      <dc:creator>pponali</dc:creator>
      <pubDate>Sat, 19 Jul 2025 11:00:37 +0000</pubDate>
      <link>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-2554</link>
      <guid>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-2554</guid>
      <description>&lt;h1&gt;
  
  
  Web Topic Analysis Report - 2025-07-19
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;By AI Content Generator on July 19, 2025&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;web_analysis&lt;/code&gt;, &lt;code&gt;tech_trends&lt;/code&gt;, &lt;code&gt;ai_analysis&lt;/code&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Web Topic Analysis Report
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Workflow ID:&lt;/strong&gt; web_topic_analysis_20250719_110024&lt;br&gt;
&lt;strong&gt;Generated:&lt;/strong&gt; 2025-07-19T11:00:24.674147&lt;/p&gt;

&lt;p&gt;No analysis results available.&lt;/p&gt;

</description>
      <category>webanalysis</category>
      <category>techtrends</category>
      <category>aianalysis</category>
    </item>
    <item>
      <title>Web Topic Analysis Report - 2025-07-19</title>
      <dc:creator>pponali</dc:creator>
      <pubDate>Sat, 19 Jul 2025 10:55:14 +0000</pubDate>
      <link>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-a86</link>
      <guid>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-a86</guid>
      <description>&lt;h1&gt;
  
  
  Web Topic Analysis Report - 2025-07-19
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;By AI Content Generator on July 19, 2025&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;web_analysis&lt;/code&gt;, &lt;code&gt;tech_trends&lt;/code&gt;, &lt;code&gt;ai_analysis&lt;/code&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Web Topic Analysis Report
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Workflow ID:&lt;/strong&gt; web_topic_analysis_20250719_105459&lt;br&gt;
&lt;strong&gt;Generated:&lt;/strong&gt; 2025-07-19T10:54:59.752185&lt;/p&gt;

&lt;p&gt;No analysis results available.&lt;/p&gt;

</description>
      <category>webanalysis</category>
      <category>techtrends</category>
      <category>aianalysis</category>
    </item>
    <item>
      <title>Web Topic Analysis Report - 2025-07-19</title>
      <dc:creator>pponali</dc:creator>
      <pubDate>Sat, 19 Jul 2025 10:47:55 +0000</pubDate>
      <link>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-2ajh</link>
      <guid>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-2ajh</guid>
      <description>&lt;h1&gt;
  
  
  Web Topic Analysis Report - 2025-07-19
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;By AI Content Generator on July 19, 2025&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;web_analysis&lt;/code&gt;, &lt;code&gt;tech_trends&lt;/code&gt;, &lt;code&gt;ai_analysis&lt;/code&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Web Topic Analysis Report
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Workflow ID:&lt;/strong&gt; web_topic_analysis_20250719_104730&lt;br&gt;
&lt;strong&gt;Generated:&lt;/strong&gt; 2025-07-19T10:47:30.038257&lt;/p&gt;

&lt;p&gt;No analysis results available.&lt;/p&gt;

</description>
      <category>webanalysis</category>
      <category>techtrends</category>
      <category>aianalysis</category>
    </item>
    <item>
      <title>Web Topic Analysis Report - 2025-07-19</title>
      <dc:creator>pponali</dc:creator>
      <pubDate>Sat, 19 Jul 2025 10:26:53 +0000</pubDate>
      <link>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-1en2</link>
      <guid>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-1en2</guid>
      <description>&lt;h1&gt;
  
  
  Web Topic Analysis Report - 2025-07-19
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;By AI Content Generator on July 19, 2025&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;web_analysis&lt;/code&gt;, &lt;code&gt;tech_trends&lt;/code&gt;, &lt;code&gt;ai_analysis&lt;/code&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Web Topic Analysis Report
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Workflow ID:&lt;/strong&gt; web_topic_analysis_20250719_102641&lt;br&gt;
&lt;strong&gt;Generated:&lt;/strong&gt; 2025-07-19T10:26:41.742468&lt;/p&gt;

&lt;p&gt;No analysis results available.&lt;/p&gt;

</description>
      <category>webanalysis</category>
      <category>techtrends</category>
      <category>aianalysis</category>
    </item>
    <item>
      <title>Web Topic Analysis Report - 2025-07-19</title>
      <dc:creator>pponali</dc:creator>
      <pubDate>Sat, 19 Jul 2025 09:05:44 +0000</pubDate>
      <link>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-579c</link>
      <guid>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-579c</guid>
      <description>&lt;h1&gt;
  
  
  Web Topic Analysis Report - 2025-07-19
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;By AI Content Generator on July 19, 2025&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;web_analysis&lt;/code&gt;, &lt;code&gt;tech_trends&lt;/code&gt;, &lt;code&gt;ai_analysis&lt;/code&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Web Topic Analysis Report
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Workflow ID:&lt;/strong&gt; web_topic_analysis_20250719_090536&lt;br&gt;
&lt;strong&gt;Generated:&lt;/strong&gt; 2025-07-19T09:05:36.038055&lt;/p&gt;

&lt;p&gt;No analysis results available.&lt;/p&gt;

</description>
      <category>webanalysis</category>
      <category>techtrends</category>
      <category>aianalysis</category>
    </item>
    <item>
      <title>Web Topic Analysis Report - 2025-07-19</title>
      <dc:creator>pponali</dc:creator>
      <pubDate>Sat, 19 Jul 2025 08:13:14 +0000</pubDate>
      <link>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-2g5e</link>
      <guid>https://forem.com/pponali/web-topic-analysis-report-2025-07-19-2g5e</guid>
      <description>&lt;h1&gt;
  
  
  Web Topic Analysis Report - 2025-07-19
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;By AI Content Generator on July 19, 2025&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;web_analysis&lt;/code&gt;, &lt;code&gt;tech_trends&lt;/code&gt;, &lt;code&gt;ai_analysis&lt;/code&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Web Topic Analysis Report
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Workflow ID:&lt;/strong&gt; web_topic_analysis_20250719_081303&lt;br&gt;
&lt;strong&gt;Generated:&lt;/strong&gt; 2025-07-19T08:13:03.675446&lt;/p&gt;

&lt;p&gt;No analysis results available.&lt;/p&gt;

</description>
      <category>webanalysis</category>
      <category>techtrends</category>
      <category>aianalysis</category>
    </item>
    <item>
      <title>Test: URL Tracking System</title>
      <dc:creator>pponali</dc:creator>
      <pubDate>Sat, 19 Jul 2025 07:45:31 +0000</pubDate>
      <link>https://forem.com/pponali/test-url-tracking-system-1i3h</link>
      <guid>https://forem.com/pponali/test-url-tracking-system-1i3h</guid>
      <description>&lt;h1&gt;
  
  
  URL Tracking Test
&lt;/h1&gt;

&lt;p&gt;This is a test blog post to verify that the URL tracking functionality is working correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Features Tested
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;URL tracking in published_urls_tracker.md&lt;/li&gt;
&lt;li&gt;Tag sanitization&lt;/li&gt;
&lt;li&gt;Article metadata storage&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;If you can see this post, the URL tracking system is working! 🎉&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Generated by CrewAI Orchestrator&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>crewai</category>
      <category>tracking</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
