<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Onur Cinar</title>
    <description>The latest articles on Forem by Onur Cinar (@onurcinar).</description>
    <link>https://forem.com/onurcinar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2455507%2F3aa78de4-9412-4988-b03a-d64d419c7f0a.jpeg</url>
      <title>Forem: Onur Cinar</title>
      <link>https://forem.com/onurcinar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/onurcinar"/>
    <language>en</language>
    <item>
      <title>Native Chaos Engineering: Testing Resilience with Fault &amp; Latency Injection</title>
      <dc:creator>Onur Cinar</dc:creator>
      <pubDate>Fri, 03 Apr 2026 14:51:48 +0000</pubDate>
      <link>https://forem.com/onurcinar/native-chaos-engineering-testing-resilience-with-fault-latency-injection-83</link>
      <guid>https://forem.com/onurcinar/native-chaos-engineering-testing-resilience-with-fault-latency-injection-83</guid>
      <description>&lt;p&gt;You’ve implemented retries, circuit breakers, and timeouts. Your application is now "resilient." But how do you know these policies actually work? Waiting for a production meltdown to verify your configuration is a high-stakes gamble. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Native Chaos Engineering&lt;/strong&gt; in Resile allows you to synthetically induce failure and latency directly into your application's execution path, ensuring your resilience policies are battle-tested before they're ever needed in production.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: "Dark Code" in Resilience Policies
&lt;/h2&gt;

&lt;p&gt;Resilience policies—like retries and circuit breakers—are often "dark code." These are execution paths that are rarely traversed under normal operating conditions. Because they only trigger during failure, they are notoriously difficult to test and prone to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Buggy Configurations&lt;/strong&gt;: A retry limit that is too high, or a circuit breaker threshold that never trips.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Unintended Side Effects&lt;/strong&gt;: A retry loop that accidentally consumes all available database connections.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Silent Failures&lt;/strong&gt;: A fallback strategy that actually panics because it hasn't been executed in months.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Traditional chaos engineering tools often operate at the infrastructure layer (e.g., killing pods or dropping network packets). While powerful, these tools can be difficult to set up in local development or staging environments and often lack the granularity to test specific application-level logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: Fault &amp;amp; Latency Injection
&lt;/h2&gt;

&lt;p&gt;Resile provides a &lt;strong&gt;Chaos Injector&lt;/strong&gt; middleware that can be integrated directly into any execution policy. By injecting synthetic faults (errors) and latency (delays) with configurable probabilities, you can simulate various failure scenarios without touching your infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Deterministic Randomness&lt;/strong&gt;: Uses Go 1.22's &lt;code&gt;math/rand/v2&lt;/code&gt; for efficient and predictable random number generation.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Context-Aware&lt;/strong&gt;: Latency injection strictly respects &lt;code&gt;context.Context&lt;/code&gt; cancellation. If your request times out while Resile is injecting chaos latency, it exits immediately.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Zero Dependencies&lt;/strong&gt;: Just like the rest of the Resile core, the chaos package depends only on the Go standard library.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Granular Control&lt;/strong&gt;: Configure error and latency probabilities independently for fine-tuned simulation.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Practical Usage
&lt;/h2&gt;

&lt;p&gt;Integrating chaos into your existing Resile policies is as simple as adding the &lt;code&gt;WithChaos&lt;/code&gt; option.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Basic Chaos Configuration
&lt;/h3&gt;

&lt;p&gt;You can define a chaos configuration that injects a 10% error rate and adds 100ms of latency to 20% of requests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/cinar/resile"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/cinar/resile/chaos"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Configure chaos injection&lt;/span&gt;
&lt;span class="n"&gt;cfg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;chaos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ErrorProbability&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="m"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                    &lt;span class="c"&gt;// 10% chance of failure&lt;/span&gt;
    &lt;span class="n"&gt;InjectedError&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;      &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"chaos!"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;   &lt;span class="c"&gt;// The error to return&lt;/span&gt;
    &lt;span class="n"&gt;LatencyProbability&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                    &lt;span class="c"&gt;// 20% chance of latency&lt;/span&gt;
    &lt;span class="n"&gt;LatencyDuration&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="m"&gt;100&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Millisecond&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c"&gt;// Delay to inject&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Apply it to an execution&lt;/span&gt;
&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoErr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithRetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithChaos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Testing Your Circuit Breaker
&lt;/h3&gt;

&lt;p&gt;Chaos injection is exceptionally useful for verifying that your circuit breaker trips under pressure. By setting a high &lt;code&gt;ErrorProbability&lt;/code&gt;, you can force the breaker to transition from &lt;code&gt;Closed&lt;/code&gt; to &lt;code&gt;Open&lt;/code&gt; in a controlled environment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;cb&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;circuit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;circuit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;WindowSize&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;           &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;FailureRateThreshold&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c"&gt;// Force 80% error rate to trip the breaker quickly&lt;/span&gt;
&lt;span class="n"&gt;cfg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;chaos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ErrorProbability&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;InjectedError&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"synthetic failure"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoErr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithCircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cb&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithChaos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Circuit Breaker State: %v&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;State&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="c"&gt;// Should be Open&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Configuration Reference
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;chaos.Config&lt;/code&gt; struct provides the following options:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ErrorProbability&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;float64&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The probability of injecting an error (0.0 to 1.0).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;InjectedError&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;error&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The error to be returned when an error is injected.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LatencyProbability&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;float64&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The probability of injecting latency (0.0 to 1.0).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LatencyDuration&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;time.Duration&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The duration of the latency to be injected.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Environment Gating&lt;/strong&gt;: Never enable chaos injection in production unless you are performing a planned game day. Use environment variables to gate the configuration:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ENABLE_CHAOS"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"true"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithChaos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loadChaosCfg&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt;: Ensure your &lt;code&gt;Instrumenter&lt;/code&gt; (like &lt;code&gt;slog&lt;/code&gt; or &lt;code&gt;OTel&lt;/code&gt;) is active. This allows you to see the injected errors and latencies in your logs and traces, making it easier to verify how your application responds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start Small&lt;/strong&gt;: Begin with low probabilities (e.g., 1-2%) to identify subtle race conditions or timeout issues before increasing the "blast radius."&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Resilience is not a "set it and forget it" feature. It requires continuous verification. By bringing chaos engineering directly into your application's execution policies, Resile empowers you to build systems that aren't just theoretically resilient, but practically battle-hardened.&lt;/p&gt;

&lt;p&gt;For more information and advanced usage, visit the &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;github.com/cinar/resile&lt;/a&gt; project.&lt;/p&gt;

</description>
      <category>go</category>
      <category>testing</category>
      <category>sre</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Beyond Static Limits: Adaptive Concurrency with TCP-Vegas in Go</title>
      <dc:creator>Onur Cinar</dc:creator>
      <pubDate>Thu, 02 Apr 2026 19:11:51 +0000</pubDate>
      <link>https://forem.com/onurcinar/beyond-static-limits-adaptive-concurrency-with-tcp-vegas-in-go-3gne</link>
      <guid>https://forem.com/onurcinar/beyond-static-limits-adaptive-concurrency-with-tcp-vegas-in-go-3gne</guid>
      <description>&lt;p&gt;Traditional concurrency limits (like bulkheads) are static. You pick a number—say, 10 concurrent requests— and hope for the best. But in the dynamic world of cloud infrastructure, "10" might be too conservative when the network is fast, or dangerously high when a downstream service starts to queue.&lt;/p&gt;

&lt;p&gt;Static limits require manual tuning, which is often done &lt;em&gt;after&lt;/em&gt; an outage has already happened. To build truly resilient systems, we need &lt;strong&gt;Adaptive Concurrency Control&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here is how to implement dynamic concurrency limits in Go using &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;Resile&lt;/a&gt;, inspired by the TCP-Vegas congestion control algorithm.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: The "Fixed-Limit" Trap
&lt;/h2&gt;

&lt;p&gt;Imagine your service talks to a database. You've set a bulkhead limit of 50 concurrent connections. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scenario A (Normal):&lt;/strong&gt; Database latency is 10ms. 50 concurrent requests mean you're handling 5,000 RPS. Everything is fine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scenario B (Degraded):&lt;/strong&gt; Database latency spikes to 500ms due to a background maintenance task. Your 50 "slots" are now filled with slow requests. Your throughput drops to 100 RPS, and new incoming requests start to pile up in your own service's memory, eventually leading to a cascade of failures.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Scenario B, 50 is &lt;strong&gt;too many&lt;/strong&gt;. You're holding onto resources that are essentially waiting on a bottleneck. You should have reduced your concurrency limit to prevent your own service from becoming part of the problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: Little's Law &amp;amp; TCP-Vegas
&lt;/h2&gt;

&lt;p&gt;Adaptive Concurrency uses two core principles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Little's Law (&lt;em&gt;L = λW&lt;/em&gt;):&lt;/strong&gt; The number of items in a system (&lt;em&gt;L&lt;/em&gt;) is equal to the arrival rate (&lt;em&gt;λ&lt;/em&gt;) multiplied by the average time an item spends in the system (&lt;em&gt;W&lt;/em&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TCP-Vegas AIMD:&lt;/strong&gt; An Additive Increase, Multiplicative Decrease (AIMD) logic based on Round-Trip Time (RTT).&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  How it works:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Baseline:&lt;/strong&gt; The algorithm tracks the minimum RTT (the fastest the system can possibly go).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Additive Increase:&lt;/strong&gt; If current latency is close to the baseline (no queuing detected), it cautiously increases the concurrency limit by 1.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiplicative Decrease:&lt;/strong&gt; If latency spikes above a threshold (e.g., 1.5 x baseline), it assumes queuing is happening downstream and immediately slashes the concurrency limit by 20%.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows your service to automatically "breathe" with the network. It expands to use available capacity when things are fast and contracts instantly to protect itself when things slow down.&lt;/p&gt;




&lt;h2&gt;
  
  
  Implementing with Resile
&lt;/h2&gt;

&lt;p&gt;Resile makes it trivial to add adaptive concurrency to your Go services.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// 1. Create a shared AdaptiveLimiter.&lt;/span&gt;
&lt;span class="c"&gt;// This should be shared across multiple calls to the same resource.&lt;/span&gt;
&lt;span class="n"&gt;al&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewAdaptiveLimiter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c"&gt;// 2. Use it in your policy.&lt;/span&gt;
&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithAdaptiveLimiterInstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;al&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// 3. Execute your action.&lt;/span&gt;
&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoErr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;callDownstreamService&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Is&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ErrShedLoad&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// The limiter has dynamically reduced the limit and shed this request&lt;/span&gt;
    &lt;span class="c"&gt;// to protect the system.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why "TCP-Vegas"?
&lt;/h3&gt;

&lt;p&gt;Unlike other congestion control algorithms (like TCP-Reno) that wait for packet loss to react, TCP-Vegas reacts to &lt;strong&gt;latency changes&lt;/strong&gt;. This is perfect for microservices where "packet loss" usually means a timed-out request or a 503 error—both of which we want to avoid &lt;em&gt;before&lt;/em&gt; they happen.&lt;/p&gt;




&lt;h2&gt;
  
  
  Zero-Configuration Resilience
&lt;/h2&gt;

&lt;p&gt;One of the biggest benefits of Adaptive Concurrency is that it requires &lt;strong&gt;zero manual configuration&lt;/strong&gt;. You don't need to know if your database can handle 50 or 500 connections. The &lt;code&gt;AdaptiveLimiter&lt;/code&gt; will discover the optimal limit in real-time.&lt;/p&gt;

&lt;p&gt;It even handles "Network Drift." Over time, the minimum baseline RTT is gradually decayed, allowing the system to recalibrate if you migrate your database to a faster region or if the network topology changes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Resilience isn't just about surviving failures; it's about &lt;strong&gt;adapting&lt;/strong&gt; to them. By moving from static bulkheads to adaptive concurrency, you're building a system that can intelligently protect itself from cascading failures while maximizing throughput during "peace time."&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href="https://github.com/cinar/resile/tree/main/examples/adaptiveconcurrency" rel="noopener noreferrer"&gt;Adaptive Concurrency Example&lt;/a&gt; in the Resile repository to see it in action.&lt;/p&gt;

</description>
      <category>go</category>
      <category>distributedsystems</category>
      <category>sre</category>
      <category>microservices</category>
    </item>
    <item>
      <title>Respecting Boundaries: Precise Rate Limiting in Go</title>
      <dc:creator>Onur Cinar</dc:creator>
      <pubDate>Tue, 24 Mar 2026 13:00:00 +0000</pubDate>
      <link>https://forem.com/onurcinar/respecting-boundaries-precise-rate-limiting-in-go-lca</link>
      <guid>https://forem.com/onurcinar/respecting-boundaries-precise-rate-limiting-in-go-lca</guid>
      <description>&lt;p&gt;Traffic spikes are a double-edged sword. On one hand, you’re busy! On the other, those spikes can overwhelm your services or exceed your downstream quotas. &lt;/p&gt;

&lt;p&gt;Whether you're protecting your own database from an unexpected burst or respecting a third-party API’s strict 100 requests-per-second (RPS) limit, you need a precise way to shape your traffic.&lt;/p&gt;

&lt;p&gt;Enter the &lt;strong&gt;Token Bucket Rate Limiter&lt;/strong&gt; in &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;Resile&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Unbounded Traffic
&lt;/h2&gt;

&lt;p&gt;In a distributed environment, your clients don't know about each other. If 50 different microservice instances all decide to call a downstream API at the same time, the aggregate traffic can easily exceed the capacity of the target system. &lt;/p&gt;

&lt;p&gt;When you exceed these limits, you'll often see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTTP 429 (Too Many Requests)&lt;/strong&gt;: Downstream services start rejecting you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cascading Latency&lt;/strong&gt;: The target system slows down for &lt;em&gt;everyone&lt;/em&gt; because it's processing too many requests at once.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Overruns&lt;/strong&gt;: Many cloud providers and SaaS APIs charge significant premiums for exceeding agreed-upon quotas.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Solution: The Token Bucket Algorithm
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Token Bucket&lt;/strong&gt; is a classic algorithm used for traffic shaping. &lt;/p&gt;

&lt;p&gt;Imagine a bucket that refills with "tokens" at a constant rate (e.g., 100 tokens per second). Every request must consume a token from the bucket. If the bucket is empty, the request is rejected immediately. This allows for small "bursts" (filling the bucket) while maintaining a precise long-term average rate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementing with Resile:
&lt;/h3&gt;

&lt;p&gt;Resile makes adding rate limiting to your executions simple.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Allow 100 requests per second.&lt;/span&gt;
&lt;span class="c"&gt;// If the limit is exceeded, it fails fast with resile.ErrRateLimitExceeded.&lt;/span&gt;
&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoErr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithRateLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Rate Limiting vs. Adaptive Retries
&lt;/h3&gt;

&lt;p&gt;Wait, doesn't Resile already have &lt;code&gt;AdaptiveBucket&lt;/code&gt;? What's the difference?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AdaptiveBucket&lt;/strong&gt; is &lt;em&gt;success-based&lt;/em&gt;. It tracks how many requests are succeeding vs. failing and throttles &lt;em&gt;retries&lt;/em&gt; accordingly. It's designed specifically to prevent "retry storms" when a service is failing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RateLimiter&lt;/strong&gt; is &lt;em&gt;time-based&lt;/em&gt;. It enforces a strict, constant quota of requests over a time interval. It’s designed for general traffic shaping and quota management.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For maximum protection, you can even use them together!&lt;/p&gt;




&lt;h2&gt;
  
  
  Shared Rate Limiters
&lt;/h2&gt;

&lt;p&gt;Often, you want to enforce a global rate limit across your entire service instance. You can create a shared &lt;code&gt;RateLimiter&lt;/code&gt; and pass it to multiple executions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Shared rate limiter for a specific API key or downstream service&lt;/span&gt;
&lt;span class="n"&gt;limiter&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewRateLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Each call will consume tokens from the same shared bucket.&lt;/span&gt;
&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoErr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myAction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithRateLimiterInstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Observability: Seeing the Shaping
&lt;/h2&gt;

&lt;p&gt;Knowing &lt;em&gt;when&lt;/em&gt; and &lt;em&gt;why&lt;/em&gt; your traffic is being throttled is essential for operational visibility. &lt;/p&gt;

&lt;p&gt;If you use Resile's telemetry integrations (like &lt;code&gt;slog&lt;/code&gt; or &lt;code&gt;OpenTelemetry&lt;/code&gt;), you'll get automatic visibility into these events. The &lt;code&gt;OnRateLimitExceeded&lt;/code&gt; event is triggered whenever a request is rejected by the rate limiter, allowing you to monitor your quota utilization in real-time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Rate limiting is not just about saying "no"; it's about being a good citizen in a distributed ecosystem. By respecting boundaries and shaping your traffic at the source, you protect both your own service and the systems you depend on.&lt;/p&gt;

&lt;p&gt;Resile provides a production-grade rate limiter that integrates seamlessly into your resilience policies, giving you fine-grained control over your traffic flow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learn more about Resile:&lt;/strong&gt; &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;github.com/cinar/resile&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>microservices</category>
      <category>sre</category>
      <category>devops</category>
    </item>
    <item>
      <title>Stop the Domino Effect: Bulkhead Isolation in Go</title>
      <dc:creator>Onur Cinar</dc:creator>
      <pubDate>Sun, 22 Mar 2026 17:42:19 +0000</pubDate>
      <link>https://forem.com/onurcinar/stop-the-domino-effect-bulkhead-isolation-in-go-5cgl</link>
      <guid>https://forem.com/onurcinar/stop-the-domino-effect-bulkhead-isolation-in-go-5cgl</guid>
      <description>&lt;p&gt;In a distributed system, failure is inevitable. But a failure in one part of your system shouldn't bring down everything else. &lt;/p&gt;

&lt;p&gt;Imagine your Go service depends on three different downstream APIs: Payments, Inventory, and Recommendations. Suddenly, the Recommendations API starts taking 30 seconds to respond. If your service doesn't have isolation, your goroutines will start piling up waiting for Recommendations. Eventually, you'll hit your process limit, and even the critical Payments API calls will start failing because there are no resources left to handle them.&lt;/p&gt;

&lt;p&gt;This is the &lt;strong&gt;Domino Effect&lt;/strong&gt;, and the &lt;strong&gt;Bulkhead Pattern&lt;/strong&gt; is how you stop it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Resource Exhaustion
&lt;/h2&gt;

&lt;p&gt;When one dependency slows down, it consumes resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Goroutines&lt;/strong&gt;: Blocked waiting for a response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt;: Each blocked goroutine carries a stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File Descriptors/Sockets&lt;/strong&gt;: Open connections to the slow service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a bulkhead, a single slow dependency can "starve" the rest of your application, leading to a total system collapse.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: The Bulkhead Pattern
&lt;/h2&gt;

&lt;p&gt;Named after the partitioned sections of a ship's hull, a &lt;strong&gt;Bulkhead&lt;/strong&gt; isolates failures. If one section of the ship is flooded, the others remain buoyant. In software, we achieve this by limiting the number of concurrent executions allowed for a specific resource or dependency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementing with Resile:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;Resile&lt;/a&gt; makes it trivial to add bulkhead isolation to any operation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Allow only 10 concurrent calls to this specific operation.&lt;/span&gt;
&lt;span class="c"&gt;// If an 11th call comes in, it fails fast with resile.ErrBulkheadFull.&lt;/span&gt;
&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoErr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithBulkhead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using a Shared Bulkhead
&lt;/h3&gt;

&lt;p&gt;Often, you want to limit concurrency across multiple different call sites that hit the same downstream service. You can create a shared &lt;code&gt;Bulkhead&lt;/code&gt; instance for this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Create a shared bulkhead for the "Inventory Service"&lt;/span&gt;
&lt;span class="n"&gt;inventoryBulkhead&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewBulkhead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Call Site A&lt;/span&gt;
&lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoErr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fetchItem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithBulkheadInstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inventoryBulkhead&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c"&gt;// Call Site B&lt;/span&gt;
&lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoErr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updateStock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithBulkheadInstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inventoryBulkhead&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By sharing the instance, you ensure that the &lt;em&gt;total&lt;/em&gt; concurrency hitting the Inventory Service never exceeds 20, regardless of which part of your code is making the call.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why "Fail-Fast" Matters
&lt;/h2&gt;

&lt;p&gt;When a bulkhead is full, Resile immediately returns &lt;code&gt;resile.ErrBulkheadFull&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;This is much better than waiting for a timeout. By failing fast, you:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Preserve Resources&lt;/strong&gt;: You don't spawn another goroutine or open another connection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provide Immediate Feedback&lt;/strong&gt;: Your upstream callers get an error instantly and can decide how to handle it (e.g., show a cached result or a "service busy" message).&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Observability: Monitoring the Walls
&lt;/h2&gt;

&lt;p&gt;You need to know when your bulkheads are working. If a bulkhead is frequently full, it might mean your downstream service is struggling, or you need to re-evaluate your capacity limits.&lt;/p&gt;

&lt;p&gt;If you use Resile's telemetry integrations (like &lt;code&gt;slog&lt;/code&gt; or &lt;code&gt;OpenTelemetry&lt;/code&gt;), you'll get automatic alerts when a bulkhead saturates. The &lt;code&gt;OnBulkheadFull&lt;/code&gt; event is triggered every time a request is rejected due to capacity limits.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Bulkheads are a fundamental building block of resilient systems. By isolating your dependencies, you ensure that a local fire doesn't become a global conflagration.&lt;/p&gt;

&lt;p&gt;Resile provides a clean, "Go-native" way to implement bulkheads without complex boilerplate, allowing you to focus on your business logic while keeping your system stable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explore Resile on GitHub:&lt;/strong&gt; &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;github.com/cinar/resile&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>microservices</category>
      <category>backend</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Infinite Data Processing in Go: Building Resilient Data Pipes with Channels</title>
      <dc:creator>Onur Cinar</dc:creator>
      <pubDate>Wed, 18 Mar 2026 16:00:00 +0000</pubDate>
      <link>https://forem.com/onurcinar/infinite-data-processing-in-go-building-resilient-data-pipes-with-channels-46d5</link>
      <guid>https://forem.com/onurcinar/infinite-data-processing-in-go-building-resilient-data-pipes-with-channels-46d5</guid>
      <description>&lt;p&gt;When building data-intensive applications, we usually start with the most obvious approach: loading data into a slice or array, iterating over it to process the data, and returning the result. This batch-processing mindset works great—until the data never stops coming.&lt;/p&gt;

&lt;p&gt;Whether you are dealing with live IoT telemetry, continuous log tailing, or real-time financial market feeds, you quickly run into the problem of "infinite" data. If you try to append an endless stream of stock ticks to a &lt;code&gt;[]float64&lt;/code&gt;, your application will inevitably consume all available memory and crash. &lt;/p&gt;

&lt;p&gt;To handle infinite data gracefully, you need to shift your architecture from batch processing to stream processing. In Go, we have the perfect built-in primitive for this: &lt;strong&gt;Channels&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Power of Channels as Data Pipes
&lt;/h2&gt;

&lt;p&gt;Go channels are often taught primarily as a way to synchronize goroutines, but they are also incredibly powerful as sequential data pipes. By treating channels as standard inputs and outputs, you can build decoupled, memory-efficient pipelines where data flows through a series of transformations continuously.&lt;/p&gt;

&lt;p&gt;When redesigning my open-source technical analysis library, &lt;strong&gt;&lt;a href="https://github.com/cinar/indicator" rel="noopener noreferrer"&gt;cinar/indicator&lt;/a&gt;&lt;/strong&gt;, for its v2 release, I faced exactly this challenge. In algorithmic trading, systems need to react instantly to live market feeds without accumulating massive memory overhead. Transitioning the library's core architecture from slice-based arrays to stream-based Go channels solved this elegantly.&lt;/p&gt;

&lt;p&gt;Let's look at how to build a continuous data pipe, and some of the tricky edge cases you'll encounter along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a Pipeline Stage
&lt;/h2&gt;

&lt;p&gt;Imagine we want to calculate a Simple Moving Average (SMA) over a live stream of data. Instead of taking a slice, our function will accept a read-only channel as its input and return a read-only channel as its output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// SimpleMovingAverage acts as a pipe: it reads from 'input', processes, and writes to 'output'&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;SimpleMovingAverage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// Ensure the output channel is closed when the input stream ends&lt;/span&gt;
        &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

        &lt;span class="n"&gt;window&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0.0&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;

            &lt;span class="c"&gt;// Keep the window size fixed&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="c"&gt;// Only emit a value once we have enough data points&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="n"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the processing happens inside its own goroutine, the function returns the &lt;code&gt;output&lt;/code&gt; channel immediately. The goroutine stays alive, eagerly waiting for new data to arrive on the &lt;code&gt;input&lt;/code&gt; channel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Stream Complexities with Helpers
&lt;/h2&gt;

&lt;p&gt;Once you start relying heavily on channels, you run into a few structural challenges. To make working with channels just as easy as working with slices, &lt;code&gt;cinar/indicator&lt;/code&gt; includes a robust &lt;code&gt;helper&lt;/code&gt; package. &lt;/p&gt;

&lt;p&gt;If you are building your own stream-based application, you can leverage these helpers directly from the library rather than reinventing the wheel.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Branching Problem
&lt;/h3&gt;

&lt;p&gt;A major gotcha with Go channels: once a value is read from a channel, it's gone. What if you want to calculate an SMA &lt;em&gt;and&lt;/em&gt; a Relative Strength Index (RSI) from the exact same live price ticker? You can't have two consumers read from one channel without them stealing data from each other.&lt;/p&gt;

&lt;p&gt;To solve this, the library provides &lt;strong&gt;&lt;code&gt;helper.Duplicate&lt;/code&gt;&lt;/strong&gt;. This function takes one input channel and "fans it out" into multiple identical output channels. This allows you to safely branch your data stream to multiple independent technical indicators simultaneously without race conditions or data loss.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Branching one price stream into three identical streams&lt;/span&gt;
&lt;span class="n"&gt;priceStreams&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;helper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duplicate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;livePrices&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;smaStream&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;indicator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SMA&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;priceStreams&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="m"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rsiStream&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;indicator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RSI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;priceStreams&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="m"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;macdStream&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;indicator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MACD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;priceStreams&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;26&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Lookbacks and Sliding Windows
&lt;/h3&gt;

&lt;p&gt;Many data processing algorithms require looking back at the last N periods. Instead of managing a sliding window manually inside every single function (like we did in the basic SMA example above), the library uses &lt;strong&gt;&lt;code&gt;helper.Buffered&lt;/code&gt;&lt;/strong&gt;. This provides a clean abstraction to maintain a rolling state over a continuous channel, vastly simplifying the development of complex logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Bridging the Gap: Slices vs. Streams
&lt;/h3&gt;

&lt;p&gt;The rest of the world often still speaks in batches. You might be downloading historical CSV data for backtesting, or you might need to output an array for a charting UI. To bridge this gap, the &lt;code&gt;helper&lt;/code&gt; package includes utilities to fluidly move between paradigms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;helper.SliceToChan&lt;/code&gt;&lt;/strong&gt;: Converts a static historical array into a simulated live data stream. It spins up a goroutine, pushes every element from the slice into a channel, and closes it. It's perfect for feeding historical backtests into a live-stream architecture.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;helper.ChanToSlice&lt;/code&gt;&lt;/strong&gt;: The inverse operation. It drains a stream back into an array, which is incredibly useful for writing unit tests or rendering charts.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Chaining the Pipes Together
&lt;/h2&gt;

&lt;p&gt;Because all the indicators and helpers in &lt;code&gt;cinar/indicator&lt;/code&gt; take channels and return channels, they are highly composable. We can chain them together like Unix command-line pipes (&lt;code&gt;|&lt;/code&gt;). &lt;/p&gt;

&lt;p&gt;Here is what it looks like to wire up an application using these concepts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"[github.com/cinar/indicator/v2/helper](https://github.com/cinar/indicator/v2/helper)"&lt;/span&gt;
    &lt;span class="s"&gt;"[github.com/cinar/indicator/v2/trend](https://github.com/cinar/indicator/v2/trend)"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// 1. We start with a static slice of historical data&lt;/span&gt;
    &lt;span class="n"&gt;historicalPrices&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="m"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;12.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;14.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;13.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;15.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;18.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;19.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;17.0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// 2. Bridge the gap: convert the slice to a live stream&lt;/span&gt;
    &lt;span class="n"&gt;marketTicks&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;helper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SliceToChan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;historicalPrices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// 3. Pipe the ticks into a 3-period SMA processor from the library&lt;/span&gt;
    &lt;span class="n"&gt;smaStream&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;trend&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sma&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;marketTicks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// 4. Drain the output stream &lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;avg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;smaStream&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"New SMA tick processed: %.2f&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why This Architecture Wins
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Memory Efficiency:&lt;/strong&gt; We only store the exact amount of data needed at any given moment. The Go garbage collector easily cleans up the rest, meaning we can process a continuous websocket stream for months without memory leaks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backpressure Handling:&lt;/strong&gt; Go channels are blocking by nature. If a complex compound strategy at the end of the pipeline is too slow, the channels will naturally fill up, pausing the producers further up the chain until it catches up. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decoupling:&lt;/strong&gt; Each pipeline stage is completely isolated. The indicator doesn't know if the data is coming from a historical Tiingo repository, an Alpaca websocket, or a mock unit test. It just reads from &lt;code&gt;&amp;lt;-chan T&lt;/code&gt; and writes to &lt;code&gt;&amp;lt;-chan T&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;

&lt;p&gt;By treating channels as native data streams and relying on robust helper utilities, you can build highly resilient, concurrent pipelines capable of processing truly infinite data sets. &lt;/p&gt;

&lt;p&gt;If you are building financial tools, real-time dashboards, or are just looking to explore a Go codebase that relies heavily on generics and channel-based streaming, I highly recommend leveraging the &lt;strong&gt;&lt;a href="https://github.com/cinar/indicator" rel="noopener noreferrer"&gt;cinar/indicator&lt;/a&gt;&lt;/strong&gt; library. It comes batteries-included with all the helpers and technical indicators you need to get started with stream processing in Go.&lt;/p&gt;

&lt;p&gt;How are you handling continuous data streams in your applications? Let me know in the comments!&lt;/p&gt;

</description>
      <category>go</category>
      <category>datascience</category>
      <category>opensource</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Self-Healing State Machines: Resilient State Transitions in Go</title>
      <dc:creator>Onur Cinar</dc:creator>
      <pubDate>Mon, 16 Mar 2026 13:00:00 +0000</pubDate>
      <link>https://forem.com/onurcinar/self-healing-state-machines-resilient-state-transitions-in-go-3e0</link>
      <guid>https://forem.com/onurcinar/self-healing-state-machines-resilient-state-transitions-in-go-3e0</guid>
      <description>&lt;p&gt;Distributed systems are inherently stateful. Whether you're managing a database connection pool, a multi-step payment workflow, or a complex IoT device lifecycle, you need to transition between states reliably.&lt;/p&gt;

&lt;p&gt;Standard state machines (FSMs) are great for logic, but they are often brittle. What happens if a transition involves a network call that fails? Most developers end up wrapping their &lt;code&gt;machine.Transition()&lt;/code&gt; calls in manual retry loops, cluttering their business logic and losing visibility into &lt;em&gt;why&lt;/em&gt; a transition failed.&lt;/p&gt;

&lt;p&gt;Inspired by Erlang's &lt;code&gt;gen_statem&lt;/code&gt; behavior, &lt;strong&gt;Resile&lt;/strong&gt; introduces &lt;code&gt;resile.StateMachine&lt;/code&gt;: a standardized, resilient state machine where every transition is inherently protected by resilience policies.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Resile Way: One-Line Resilience for Transitions
&lt;/h2&gt;

&lt;p&gt;With &lt;code&gt;resile.StateMachine&lt;/code&gt;, you don't just define how to move from State A to State B. You define a &lt;strong&gt;Resilient Transition&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here is how you implement a self-healing connection manager:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"github.com/cinar/resile"&lt;/span&gt;

&lt;span class="c"&gt;// 1. Define your State, Data, and Events&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;State&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Disconnected&lt;/span&gt; &lt;span class="n"&gt;State&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Disconnected"&lt;/span&gt;
    &lt;span class="n"&gt;Connected&lt;/span&gt;    &lt;span class="n"&gt;State&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Connected"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Connect&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Connect"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// 2. Define the Transition Logic&lt;/span&gt;
&lt;span class="n"&gt;transition&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="n"&gt;State&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RetryState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;State&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Disconnected&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Connect&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// This transition involves a network call.&lt;/span&gt;
        &lt;span class="c"&gt;// If it fails, Resile will automatically retry it &lt;/span&gt;
        &lt;span class="c"&gt;// using the configured backoff and jitter.&lt;/span&gt;
        &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;apiClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Endpoint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Connected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// 3. Initialize the Resilient State Machine&lt;/span&gt;
&lt;span class="n"&gt;sm&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewStateMachine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Disconnected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;Data&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Endpoint&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"api.example.com"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; 
    &lt;span class="n"&gt;transition&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithMaxAttempts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithBaseDelay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Millisecond&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// 4. Handle events safely&lt;/span&gt;
&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Connect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What happens under the hood?
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;When you call &lt;code&gt;sm.Handle(ctx, event)&lt;/code&gt;, Resile enters a execution envelope.&lt;/li&gt;
&lt;li&gt;It executes your &lt;code&gt;transition&lt;/code&gt; function.&lt;/li&gt;
&lt;li&gt;If the &lt;code&gt;transition&lt;/code&gt; returns an error, Resile applies your retry policy (e.g., Exponential Backoff with Jitter).&lt;/li&gt;
&lt;li&gt;Only when the &lt;code&gt;transition&lt;/code&gt; succeeds does the &lt;code&gt;StateMachine&lt;/code&gt; update its internal state and data.&lt;/li&gt;
&lt;li&gt;If the retries are exhausted, the &lt;code&gt;StateMachine&lt;/code&gt; remains in its previous state, ensuring consistency.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Why "Self-Healing"?
&lt;/h2&gt;

&lt;p&gt;Most state machine implementations are "fire and forget" or "fail and stop." A &lt;strong&gt;Self-Healing&lt;/strong&gt; state machine assumes that transitions are risky and provides the infrastructure to recover from those risks automatically.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Automatic Retries&lt;/strong&gt;: No more manual loops inside your state logic.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Circuit Breakers&lt;/strong&gt;: If a specific transition (e.g., to a "Maintenance" state) is failing repeatedly, the circuit breaker can trip to prevent overwhelming the system.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Context Awareness&lt;/strong&gt;: If the transition is part of a timed-out request, the state machine cancels the transition attempt immediately, preventing goroutine leaks.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Observability&lt;/strong&gt;: Every transition attempt—including retries—is tracked by Resile's telemetry hooks. You can see exactly how many times your machine "struggled" to reach the &lt;code&gt;Connected&lt;/code&gt; state.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Observability: Tracking State Success
&lt;/h2&gt;

&lt;p&gt;By using Resile's OpenTelemetry or &lt;code&gt;slog&lt;/code&gt; integrations, you get deep insights into your state machine's health:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Attempts per Transition&lt;/strong&gt;: See which events are causing the most retries.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Transition Latency&lt;/strong&gt;: Measure how long it takes to move from one state to another, including backoff time.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Failure Patterns&lt;/strong&gt;: Identify if a specific state is a "dead end" due to persistent errors.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Resilience isn't just for simple API calls. By bringing resilience to the core of your stateful logic, you build systems that are not only more robust but also significantly easier to debug and monitor.&lt;/p&gt;

&lt;p&gt;Stop writing manual retry loops around your state changes. Let &lt;code&gt;resile.StateMachine&lt;/code&gt; handle the complexity of the "unreliable world" while you focus on the logic of your application.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Give Resile a star on GitHub:&lt;/strong&gt; &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;github.com/cinar/resile&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How are you managing state transitions in your Go microservices? Let's discuss!&lt;/p&gt;

</description>
      <category>go</category>
      <category>microservices</category>
      <category>programming</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Preventing Microservice Meltdowns: Adaptive Retries and Circuit Breakers in Go</title>
      <dc:creator>Onur Cinar</dc:creator>
      <pubDate>Sun, 15 Mar 2026 22:28:33 +0000</pubDate>
      <link>https://forem.com/onurcinar/preventing-microservice-meltdowns-adaptive-retries-and-circuit-breakers-in-go-30ho</link>
      <guid>https://forem.com/onurcinar/preventing-microservice-meltdowns-adaptive-retries-and-circuit-breakers-in-go-30ho</guid>
      <description>&lt;p&gt;We’ve all been there. A downstream database has a momentary blip. Your service instances, being "resilient," immediately start retrying their failed requests. &lt;/p&gt;

&lt;p&gt;Suddenly, the database isn't just "having a blip" anymore—it’s being hammered by a self-inflicted DDoS attack from its own clients. This is the &lt;strong&gt;Retry Storm&lt;/strong&gt; (or Thundering Herd), and it’s one of the most common ways distributed systems experience total meltdowns.&lt;/p&gt;

&lt;p&gt;Standard exponential backoff protects individual services, but it doesn't protect the &lt;em&gt;cluster&lt;/em&gt;. To do that, you need a layered defense-in-depth approach.&lt;/p&gt;

&lt;p&gt;Here is how to prevent microservice meltdowns in Go using &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;Resile&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Aggregate Load
&lt;/h2&gt;

&lt;p&gt;Imagine you have 100 instances of your API. Each instance is configured to retry 3 times. If the database slows down, you suddenly have &lt;strong&gt;300 extra requests&lt;/strong&gt; hitting it exactly when it's struggling to recover.&lt;/p&gt;

&lt;p&gt;Even with jitter, the aggregate load can be enough to keep the database in a "failed" state indefinitely. To solve this, we need two patterns working together: &lt;strong&gt;Adaptive Retries&lt;/strong&gt; and &lt;strong&gt;Circuit Breakers&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Adaptive Retries (The Token Bucket)
&lt;/h2&gt;

&lt;p&gt;Inspired by Google's SRE book and AWS SDKs, &lt;strong&gt;Adaptive Retries&lt;/strong&gt; use a client-side token bucket to "fail fast" locally.&lt;/p&gt;

&lt;p&gt;The logic is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every &lt;strong&gt;success&lt;/strong&gt; adds a small amount of "credit" to your bucket.&lt;/li&gt;
&lt;li&gt;Every &lt;strong&gt;retry&lt;/strong&gt; consumes a significant amount of credit.&lt;/li&gt;
&lt;li&gt;If the bucket is empty, Resile &lt;strong&gt;stops retrying immediately&lt;/strong&gt; and fails fast locally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ensures that if a downstream service is fundamentally degraded, your fleet of clients will automatically throttle their retry pressure at the source, giving the service breathing room to recover.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementing with Resile:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Share this bucket across multiple executions or even your entire service&lt;/span&gt;
&lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DefaultAdaptiveBucket&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoErr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithAdaptiveBucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  2. Circuit Breakers (The Kill Switch)
&lt;/h2&gt;

&lt;p&gt;While retries assume "eventual success," a &lt;strong&gt;Circuit Breaker&lt;/strong&gt; assumes "statistical failure." &lt;/p&gt;

&lt;p&gt;If a service fails 5 times in a row, the breaker "trips" (opens). For the next 30 seconds, every call to that service will fail &lt;strong&gt;instantly&lt;/strong&gt; without even trying to hit the network. This protects your downstream infrastructure from useless traffic and saves your local resources (threads, memory, sockets).&lt;/p&gt;

&lt;h3&gt;
  
  
  Layering it in Resile:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"github.com/cinar/resile/circuit"&lt;/span&gt;

&lt;span class="c"&gt;// Create a breaker: Trip after 5 failures, wait 30s to retry&lt;/span&gt;
&lt;span class="n"&gt;cb&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;circuit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;circuit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;FailureThreshold&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ResetTimeout&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="m"&gt;30&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoErr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithCircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cb&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Ultimate Defense: Layered Resilience
&lt;/h2&gt;

&lt;p&gt;The real power of Resile comes from combining these patterns. You can layer Retries, Circuit Breakers, and Adaptive Buckets into a single execution strategy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoErr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithMaxAttempts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;           &lt;span class="c"&gt;// Layer 1: Handle random blips&lt;/span&gt;
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithCircuitBreaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cb&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;      &lt;span class="c"&gt;// Layer 2: Stop hitting a dead service&lt;/span&gt;
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithAdaptiveBucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c"&gt;// Layer 3: Prevent cluster-wide storms&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this setup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retries&lt;/strong&gt; handle the "one-off" network glitches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Circuit Breaker&lt;/strong&gt; stops you from wasting time on a service that is clearly down.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Adaptive Bucket&lt;/strong&gt; ensures that even if the breaker hasn't tripped yet, you won't overwhelm the system with aggregate retry load.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Observability: Seeing the Shield in Action
&lt;/h2&gt;

&lt;p&gt;Protecting your system is great, but &lt;em&gt;knowing&lt;/em&gt; you’re being protected is better. &lt;/p&gt;

&lt;p&gt;If you use Resile's &lt;code&gt;slog&lt;/code&gt; or &lt;code&gt;OpenTelemetry&lt;/code&gt; integrations, you'll see exactly when these shields activate. Your logs will show &lt;code&gt;retry.throlled=true&lt;/code&gt; when the adaptive bucket kicks in, or your traces will show a &lt;code&gt;circuit.open&lt;/code&gt; error when the breaker prevents a call.&lt;/p&gt;

&lt;p&gt;This visibility is crucial for SREs to understand &lt;em&gt;why&lt;/em&gt; traffic is failing and &lt;em&gt;how&lt;/em&gt; the system is self-healing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building resilient microservices isn't just about making individual calls "smarter." It's about ensuring that your entire architecture can survive a storm without collapsing under its own weight.&lt;/p&gt;

&lt;p&gt;By combining opinionated retries, circuit breakers, and adaptive throttling, Resile gives you a production-grade resilience engine that scales with your infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try Resile today:&lt;/strong&gt; &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;github.com/cinar/resile&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How do you prevent "retry storms" in your Go clusters? Let's discuss in the comments!&lt;/p&gt;

</description>
      <category>go</category>
      <category>microservices</category>
      <category>sre</category>
      <category>devops</category>
    </item>
    <item>
      <title>Beating Tail Latency: A Guide to Request Hedging in Go Microservices</title>
      <dc:creator>Onur Cinar</dc:creator>
      <pubDate>Sat, 14 Mar 2026 03:03:35 +0000</pubDate>
      <link>https://forem.com/onurcinar/beating-tail-latency-a-guide-to-request-hedging-in-go-microservices-p81</link>
      <guid>https://forem.com/onurcinar/beating-tail-latency-a-guide-to-request-hedging-in-go-microservices-p81</guid>
      <description>&lt;p&gt;In distributed systems, we often talk about "The Long Tail." &lt;/p&gt;

&lt;p&gt;You might have a service where 95% of requests finish in under 100ms. But that last 1% (the P99 latency)? Those requests might take 2 seconds or more. In a microservice architecture where one user action triggers 10 different service calls, that one slow dependency will bottleneck the entire user experience.&lt;/p&gt;

&lt;p&gt;Standard retries don't help here. Why? Because a "Tail Latency" request hasn't failed yet—it’s just &lt;em&gt;slow&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;Waiting for a 2-second timeout to trigger a retry is a waste of time. To beat the long tail, you need &lt;strong&gt;Request Hedging&lt;/strong&gt; (also known as Speculative Retries).&lt;/p&gt;

&lt;p&gt;Here is how to implement it safely in Go using &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;Resile&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Request Hedging?
&lt;/h2&gt;

&lt;p&gt;The concept is simple but powerful: If a request is taking longer than usual (say, longer than the P95 latency), don't kill it. Instead, &lt;strong&gt;start a second, identical request in parallel.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Whichever request finishes first, you take its result and cancel the other one.&lt;/p&gt;

&lt;p&gt;This "speculative" approach drastically reduces P99 latency because the mathematical probability of &lt;em&gt;two&lt;/em&gt; identical requests hitting the "long tail" simultaneously is extremely low.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Complexity of Manual Hedging
&lt;/h2&gt;

&lt;p&gt;Implementing hedging manually in Go is a nightmare of goroutine management:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You need a &lt;code&gt;select&lt;/code&gt; block with a timer.&lt;/li&gt;
&lt;li&gt;You need to coordinate between two (or more) goroutines.&lt;/li&gt;
&lt;li&gt;You must ensure that once one succeeds, the others are cancelled immediately to save resources.&lt;/li&gt;
&lt;li&gt;You have to handle race conditions where both might succeed at the exact same millisecond.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most developers end up with hundreds of lines of brittle boilerplate code to handle just one hedged call.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Resile Way: &lt;code&gt;DoHedged&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Resile&lt;/strong&gt; makes request hedging as simple as a single function call. It handles the goroutine lifecycle, context cancellation, and race conditions for you.&lt;/p&gt;

&lt;p&gt;Here is how you fetch data with a 100ms hedging delay:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"github.com/cinar/resile"&lt;/span&gt;

&lt;span class="c"&gt;// data is automatically inferred as *User&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoHedged&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;apiClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; 
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithMaxAttempts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithHedgingDelay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Millisecond&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What happens under the hood?
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Resile starts the first request.&lt;/li&gt;
&lt;li&gt;It waits for 100ms (&lt;code&gt;HedgingDelay&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;If the first request hasn't finished, it starts a &lt;strong&gt;second&lt;/strong&gt; request.&lt;/li&gt;
&lt;li&gt;As soon as one returns a successful result, Resile &lt;strong&gt;cancels the context&lt;/strong&gt; of the other request and returns the data to you.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Picking the Right Hedging Delay
&lt;/h2&gt;

&lt;p&gt;The "magic" of hedging lies in the delay. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Too short:&lt;/strong&gt; You double your traffic unnecessarily, putting extra load on your downstream services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Too long:&lt;/strong&gt; You don't gain much latency benefit.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pro-Tip:&lt;/strong&gt; A good rule of thumb is to set your &lt;code&gt;HedgingDelay&lt;/code&gt; to your &lt;strong&gt;P95 or P99 latency&lt;/strong&gt;. This ensures you only "hedge" the slowest 1-5% of requests, providing a massive latency win with minimal extra load.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability: Tracking the "Speculative" Wins
&lt;/h2&gt;

&lt;p&gt;If you're using Resile's OpenTelemetry integration (&lt;code&gt;telemetry/resileotel&lt;/code&gt;), you can actually see these wins in your distributed traces. &lt;/p&gt;

&lt;p&gt;Each hedged attempt is recorded as a sub-span. When a hedged request wins, you'll see the first span get cancelled and the second one succeed—providing clear proof that hedging saved your user from a 2-second wait.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Request hedging used to be a technique reserved for companies with massive infrastructure teams. With Resile, it’s a tool that every Go developer can use to build snappier, more resilient microservices.&lt;/p&gt;

&lt;p&gt;By moving from "Wait and Retry" to "Hedge and Win," you can turn your long-tail latency into a competitive advantage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Give Resile a star on GitHub:&lt;/strong&gt; &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;github.com/cinar/resile&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How are you handling tail latency in your Go services? Let's discuss in the comments!&lt;/p&gt;

</description>
      <category>go</category>
      <category>microservices</category>
      <category>performance</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Python's Stamina for Go: Bringing Ergonomic Resilience to Gophers</title>
      <dc:creator>Onur Cinar</dc:creator>
      <pubDate>Tue, 10 Mar 2026 02:04:41 +0000</pubDate>
      <link>https://forem.com/onurcinar/pythons-stamina-for-go-bringing-ergonomic-resilience-to-gophers-1lf2</link>
      <guid>https://forem.com/onurcinar/pythons-stamina-for-go-bringing-ergonomic-resilience-to-gophers-1lf2</guid>
      <description>&lt;p&gt;If you've ever worked in the Python ecosystem, you've likely encountered &lt;a href="https://github.com/jd/tenacity" rel="noopener noreferrer"&gt;tenacity&lt;/a&gt; or its opinionated wrapper, &lt;a href="https://github.com/hynek/stamina" rel="noopener noreferrer"&gt;stamina&lt;/a&gt;. They make retrying transient failures feel like magic: a single decorator, sensible production defaults (exponential backoff + jitter), and built-in observability.&lt;/p&gt;

&lt;p&gt;The Go ecosystem has powerful tools, but they often require a lot of boilerplate, use reflection, or lack the "Correct by Default" philosophy that makes &lt;code&gt;stamina&lt;/code&gt; so great.&lt;/p&gt;

&lt;p&gt;That's why I built &lt;strong&gt;&lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;Resile&lt;/a&gt;&lt;/strong&gt;. It’s a love letter to Python's ergonomics, written in idiomatic, type-safe Go 1.18+.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Ergonomic Gap
&lt;/h2&gt;

&lt;p&gt;In Python, retrying an API call with &lt;code&gt;stamina&lt;/code&gt; looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stamina&lt;/span&gt;

&lt;span class="nd"&gt;@stamina.retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HTTPError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_data&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before Go 1.18, achieving this level of simplicity was nearly impossible. You either had to write verbose &lt;code&gt;for&lt;/code&gt; loops or use libraries that relied on &lt;code&gt;interface{}&lt;/code&gt; and reflection—which meant losing type safety and slowing down your code.&lt;/p&gt;

&lt;p&gt;With the arrival of &lt;strong&gt;Generics&lt;/strong&gt;, the game changed. We can now have the best of both worlds: Python-level ergonomics with Go’s compile-time safety.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Resile Way: One-Line Resilience
&lt;/h2&gt;

&lt;p&gt;Resile uses Go 1.18+ Type Parameters to wrap your logic in a resilience envelope. Here is how you fetch data with Resile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"github.com/cinar/resile"&lt;/span&gt;

&lt;span class="c"&gt;// data is automatically inferred as *User&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Do&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;apiClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithMaxAttempts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It looks and feels like a simple function call, but under the hood, Resile is doing the heavy lifting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;AWS Full Jitter Backoff&lt;/strong&gt;: Spreading out retries to protect your database.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Context-Awareness&lt;/strong&gt;: Cancelling retries immediately if the request times out.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Memory Safety&lt;/strong&gt;: Using managed timers to prevent goroutine leaks.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why "Correct by Default" Matters
&lt;/h2&gt;

&lt;p&gt;One of the best things about &lt;code&gt;stamina&lt;/code&gt; is that it makes the &lt;em&gt;right&lt;/em&gt; thing the &lt;em&gt;easy&lt;/em&gt; thing. Resile follows this philosophy strictly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Exponential Backoff is the Default&lt;/strong&gt;: You don't have to configure it; it's there from attempt one.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Jitter is Not Optional&lt;/strong&gt;: Resile forces randomization to prevent "thundering herd" outages.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Zero Dependencies&lt;/strong&gt;: The core of Resile depends only on the Go standard library. No bloated dependency graphs.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;No Reflection&lt;/strong&gt;: Unlike many older Go retry libraries, Resile uses static type parameters. This means zero runtime overhead and zero chance of a "type mismatch" panic.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Batteries Included (But Removable)
&lt;/h2&gt;

&lt;p&gt;Just like &lt;code&gt;stamina&lt;/code&gt; integrates with &lt;code&gt;structlog&lt;/code&gt; and &lt;code&gt;Prometheus&lt;/code&gt;, Resile provides optional sub-packages for modern observability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;telemetry/resileslog&lt;/code&gt;&lt;/strong&gt;: High-performance structured logging with Go 1.21’s &lt;code&gt;slog&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;telemetry/resileotel&lt;/code&gt;&lt;/strong&gt;: Full OpenTelemetry tracing. See every retry attempt as a sub-span in your Jaeger or Honeycomb dashboard.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Adaptive Retries&lt;/strong&gt;: A client-side token bucket (inspired by Google's SRE book) to prevent your fleet from killing a degraded service.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Fast CI/CD: The "Stamina" Secret
&lt;/h2&gt;

&lt;p&gt;A hidden feature of &lt;code&gt;stamina&lt;/code&gt; that I absolutely loved was the ability to globally disable wait times in unit tests. &lt;/p&gt;

&lt;p&gt;Resile brings this to Go through &lt;strong&gt;Context Overrides&lt;/strong&gt;. You can make your 30-second retry loop execute in 1 millisecond during tests without changing a single line of business logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// This context tells Resile to skip all sleep timers&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTestingBypass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;myService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PerformTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// Retries instantly!&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We don't have to choose between Go's performance and Python's developer experience. By leveraging modern Go features like Generics and &lt;code&gt;slog&lt;/code&gt;, we can build tools that are both powerful and a joy to use.&lt;/p&gt;

&lt;p&gt;If you’ve been missing &lt;code&gt;stamina&lt;/code&gt; in your Go projects, give &lt;strong&gt;Resile&lt;/strong&gt; a try.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check it out on GitHub:&lt;/strong&gt; &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;github.com/cinar/resile&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>python</category>
      <category>backend</category>
      <category>programming</category>
    </item>
    <item>
      <title>Stop Writing Manual Retry Loops in Go: Why Your Current Logic is Probably Dangerous</title>
      <dc:creator>Onur Cinar</dc:creator>
      <pubDate>Mon, 09 Mar 2026 02:55:42 +0000</pubDate>
      <link>https://forem.com/onurcinar/stop-writing-manual-retry-loops-in-go-why-your-current-logic-is-probably-dangerous-5bj5</link>
      <guid>https://forem.com/onurcinar/stop-writing-manual-retry-loops-in-go-why-your-current-logic-is-probably-dangerous-5bj5</guid>
      <description>&lt;p&gt;If you've been writing Go for more than a week, you've likely written a retry loop. It usually starts like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;doSomething&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's simple, idiomatic, and... &lt;strong&gt;a ticking time bomb in production.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a distributed system, transient failures—network blips, database locks, rate limits—are mathematical certainties. While a simple &lt;code&gt;for&lt;/code&gt; loop feels like enough, it often fails exactly when your system is under the most stress.&lt;/p&gt;

&lt;p&gt;Here is why your manual retry logic is probably dangerous, and how to fix it using &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;Resile&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 3 Silent Killers of Manual Retries
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The Thundering Herd (Missing Jitter)
&lt;/h3&gt;

&lt;p&gt;If your service has 1,000 instances and the database goes down for a second, all 1,000 instances will fail at once. With a fixed &lt;code&gt;time.Sleep(1 * time.Second)&lt;/code&gt;, all 1,000 instances will then wake up at the exact same millisecond and hammer the database again. &lt;/p&gt;

&lt;p&gt;This is a self-inflicted DDoS attack. Without &lt;strong&gt;Jitter&lt;/strong&gt; (randomized delay), your retries are just synchronized waves of traffic that prevent your dependencies from ever recovering.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Context Blindness
&lt;/h3&gt;

&lt;p&gt;Does your retry loop respect &lt;code&gt;context.Context&lt;/code&gt;? Most don't. If a user cancels their request or a global timeout is reached, a &lt;code&gt;time.Sleep&lt;/code&gt; will block that goroutine until the timer expires. &lt;/p&gt;

&lt;p&gt;In a high-concurrency environment, these "hanging" goroutines pile up, leading to memory exhaustion and silent failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The &lt;code&gt;time.After&lt;/code&gt; Memory Leak
&lt;/h3&gt;

&lt;p&gt;Even "advanced" developers trying to be context-aware often use this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;After&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="c"&gt;// DANGER!&lt;/span&gt;
    &lt;span class="c"&gt;// proceed&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;According to the Go standard library, the timer created by &lt;code&gt;time.After&lt;/code&gt; &lt;strong&gt;is not garbage collected until it fires&lt;/strong&gt;, even if the &lt;code&gt;ctx.Done()&lt;/code&gt; case is chosen. In a busy service with long retries, this creates a slow-motion memory leak that is incredibly hard to debug.&lt;/p&gt;




&lt;h2&gt;
  
  
  Introducing Resile: Ergonomic Resilience for Go
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;Resile&lt;/strong&gt; because I missed the ergonomics of Python’s &lt;code&gt;stamina&lt;/code&gt; and &lt;code&gt;tenacity&lt;/code&gt; libraries, but I wanted the uncompromising type safety of Go 1.18+ Generics.&lt;/p&gt;

&lt;p&gt;Resile is a zero-dependency, execution resilience library that makes the "Correct Way" to retry as easy as a single function call.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Hello, World" of Resile
&lt;/h3&gt;

&lt;p&gt;Instead of a manual loop, you use &lt;code&gt;DoErr&lt;/code&gt; for actions that only return an error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DoErr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PingContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or &lt;code&gt;Do&lt;/code&gt; for value-yielding operations with full type safety:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// user is automatically inferred as *User&lt;/span&gt;
&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Do&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;apiClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithMaxAttempts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why Resile?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. AWS Full Jitter by Default
&lt;/h3&gt;

&lt;p&gt;Resile implements the industry-standard &lt;strong&gt;Full Jitter&lt;/strong&gt; algorithm. Instead of sleeping for a fixed time, it calculates an exponential backoff and then picks a random value between 0 and that maximum. This perfectly spreads out the load across your cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Memory-Safe Timer Management
&lt;/h3&gt;

&lt;p&gt;Resile doesn't use &lt;code&gt;time.After&lt;/code&gt;. It uses a managed &lt;code&gt;time.Timer&lt;/code&gt; with explicit cleanup. Whether your retry succeeds, fails, or the context is cancelled, Resile ensures all resources are returned to the runtime immediately.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Generic-First API
&lt;/h3&gt;

&lt;p&gt;No &lt;code&gt;interface{}&lt;/code&gt;, no reflection, and no type casting. Because it uses Go Generics, the compiler checks your types at build time. If your function returns a &lt;code&gt;*User&lt;/code&gt;, Resile returns a &lt;code&gt;*User&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Fast Unit Testing
&lt;/h3&gt;

&lt;p&gt;One of the biggest pain points of retries is that they slow down your CI/CD pipelines. Who wants to wait 10 seconds for a test to finish because of backoff?&lt;/p&gt;

&lt;p&gt;With Resile, you can use &lt;code&gt;WithTestingBypass&lt;/code&gt; to make all retries execute instantly in your tests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestMyService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;resile&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTestingBypass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="c"&gt;// This will retry 5 times INSTANTLY without sleeping.&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Beyond Simple Retries
&lt;/h2&gt;

&lt;p&gt;Resile isn't just a retry loop; it's a resilience toolkit. Out of the box, you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Request Hedging&lt;/strong&gt;: Start a second request if the first one is taking too long (beating tail latency).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adaptive Retries&lt;/strong&gt;: A client-side token bucket to prevent "retry storms" across a cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Circuit Breaker Integration&lt;/strong&gt;: Stop retrying when a service is fundamentally down.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Panic Recovery&lt;/strong&gt;: Convert unexpected panics into retryable errors (the Erlang "Let It Crash" way).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Retrying is a distributed systems problem, not just a loop problem. By moving away from manual loops to a dedicated resilience engine like Resile, you protect your downstream services, eliminate memory leaks, and keep your code clean and type-safe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check out Resile on GitHub:&lt;/strong&gt; &lt;a href="https://github.com/cinar/resile" rel="noopener noreferrer"&gt;github.com/cinar/resile&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How are you handling transient failures in your Go services? Let's discuss in the comments!&lt;/p&gt;

</description>
      <category>go</category>
      <category>backend</category>
      <category>distributedsystems</category>
      <category>microservices</category>
    </item>
  </channel>
</rss>
