<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Anshika Jain</title>
    <description>The latest articles on Forem by Anshika Jain (@anshikakalpana).</description>
    <link>https://forem.com/anshikakalpana</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3602941%2F92aed9f8-0c0c-4393-a952-6c65be494c5f.png</url>
      <title>Forem: Anshika Jain</title>
      <link>https://forem.com/anshikakalpana</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/anshikakalpana"/>
    <language>en</language>
    <item>
      <title>[Boost]</title>
      <dc:creator>Anshika Jain</dc:creator>
      <pubDate>Sat, 24 Jan 2026 10:07:10 +0000</pubDate>
      <link>https://forem.com/anshikakalpana/-1o05</link>
      <guid>https://forem.com/anshikakalpana/-1o05</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/anshikakalpana" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3602941%2F92aed9f8-0c0c-4393-a952-6c65be494c5f.png" alt="anshikakalpana"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/anshikakalpana/retraced-a-job-scheduler-where-retries-are-data-not-magic-b41" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;ReTraced: A Job Scheduler Where Retries Are Data (Not Magic)&lt;/h2&gt;
      &lt;h3&gt;Anshika Jain ・ Jan 24&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#distributedsystems&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#typescript&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#systemdesign&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#redis&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>distributedsystems</category>
      <category>typescript</category>
      <category>systemdesign</category>
      <category>redis</category>
    </item>
    <item>
      <title>ReTraced: A Job Scheduler Where Retries Are Data (Not Magic)</title>
      <dc:creator>Anshika Jain</dc:creator>
      <pubDate>Sat, 24 Jan 2026 10:04:44 +0000</pubDate>
      <link>https://forem.com/anshikakalpana/retraced-a-job-scheduler-where-retries-are-data-not-magic-b41</link>
      <guid>https://forem.com/anshikakalpana/retraced-a-job-scheduler-where-retries-are-data-not-magic-b41</guid>
      <description>&lt;p&gt;&lt;a href="https://re-trace-five.vercel.app/" rel="noopener noreferrer"&gt;Documentation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Anshikakalpana/ReTraced" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Most job schedulers do retries… but they don’t &lt;em&gt;explain&lt;/em&gt; retries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ReTraced&lt;/strong&gt; is a transparent and extensible distributed job scheduler built to make &lt;strong&gt;retry behavior, failure handling, and job lifecycle transitions explicit and observable&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Unlike many schedulers that hide retries behind config flags and internal engines, &lt;strong&gt;ReTraced treats retries as first-class data&lt;/strong&gt; — visible, auditable, and configurable &lt;em&gt;per job&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;ReTraced is not designed to hide complexity.&lt;br&gt;&lt;br&gt;
It’s designed to &lt;strong&gt;expose it clearly&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why ReTraced Exists
&lt;/h2&gt;

&lt;p&gt;Modern schedulers are powerful, but they often:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hide retry decisions inside internal engines
&lt;/li&gt;
&lt;li&gt;Expose retry counts without retry &lt;em&gt;intent&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Make failure analysis opaque and indirect
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ReTraced was built to answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Why did this job retry at this moment?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Was the retry automatic or manually triggered?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Is this failure temporary or permanent?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;When and why did retries stop?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These questions matter when building reliable, debuggable distributed systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Core Philosophy
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ✅ Explicit Over Implicit
&lt;/h3&gt;

&lt;p&gt;In ReTraced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retry attempts are stored as &lt;strong&gt;structured, queryable data&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;failures are classified (&lt;strong&gt;temporary vs permanent&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DLQ is first-class&lt;/strong&gt; (not an afterthought)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes execution behavior predictable, inspectable, and explainable.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ Practical Before Perfect
&lt;/h3&gt;

&lt;p&gt;ReTraced favors clarity and control over hidden guarantees:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;at-least-once&lt;/strong&gt; delivery semantics
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redis-backed&lt;/strong&gt; state for speed + simplicity
&lt;/li&gt;
&lt;li&gt;minimal coordination logic
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Performance Snapshot (Local)
&lt;/h2&gt;

&lt;p&gt;ReTraced prioritizes correctness + visibility while still being fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benchmark (local, Redis-backed):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10,000 jobs in ~2.4s&lt;/strong&gt; with &lt;strong&gt;1 worker&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10,000 jobs in ~2.1s&lt;/strong&gt; with &lt;strong&gt;5 workers&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This shows low scheduler overhead and good worker scalability.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Benchmarks are indicative, not a production SLA.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Makes ReTraced Different
&lt;/h2&gt;

&lt;h3&gt;
  
  
  🔁 Retry as Data
&lt;/h3&gt;

&lt;p&gt;Every job keeps a retry history:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;timestamp + error&lt;/li&gt;
&lt;li&gt;trigger: &lt;code&gt;AUTO&lt;/code&gt; or &lt;code&gt;MANUAL&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;retry result and final outcome&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This enables real audit trails, DLQ forensics, and safer replays.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧠 Per-Job Retry Strategies
&lt;/h3&gt;

&lt;p&gt;Each job can define its own retry behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fixed delay&lt;/li&gt;
&lt;li&gt;linear backoff&lt;/li&gt;
&lt;li&gt;exponential backoff (with/without jitter)&lt;/li&gt;
&lt;li&gt;multi-phase retries → DLQ&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🧾 First-Class DLQ
&lt;/h3&gt;

&lt;p&gt;When a job goes dead, it’s not “lost”.&lt;br&gt;&lt;br&gt;
ReTraced preserves the full execution story:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retry history&lt;/li&gt;
&lt;li&gt;failure context&lt;/li&gt;
&lt;li&gt;poison-job identification&lt;/li&gt;
&lt;li&gt;manual retries tracked clearly&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Bug ReTraced Helped Me Catch (Real Example)
&lt;/h2&gt;

&lt;p&gt;While stress testing, I found a &lt;strong&gt;backoff timing bug&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;I expected exponential delays to grow like:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;5s → 10s → 20s → 40s&lt;/code&gt; (total ~75s)&lt;/p&gt;

&lt;p&gt;But actual retry timestamps showed delays plateauing near &lt;strong&gt;~6s&lt;/strong&gt; (total ~28s).&lt;/p&gt;

&lt;p&gt;This bug was only visible because ReTraced stores retry attempts as real data — not hidden scheduler state.&lt;/p&gt;

&lt;p&gt;That’s the whole point: &lt;strong&gt;make failures debuggable by design.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Current Status + What’s Next
&lt;/h2&gt;

&lt;p&gt;ReTraced just hit &lt;strong&gt;v1.0.0&lt;/strong&gt; — meaning the core retry-as-data model, &lt;br&gt;
DLQ handling, and per-job strategies are stable and usable.&lt;/p&gt;

&lt;p&gt;ReTraced is usable today for experimentation and internal tools, and I’m actively improving it toward a &lt;strong&gt;production-ready self-hostable system&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It’s not trying to replace mature schedulers — it complements them by making &lt;strong&gt;retry intent&lt;/strong&gt; and &lt;strong&gt;failure behavior&lt;/strong&gt; visible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Personal Note
&lt;/h2&gt;

&lt;p&gt;I’m a &lt;strong&gt;2nd year student&lt;/strong&gt;, deeply interested in &lt;strong&gt;distributed systems&lt;/strong&gt;, and I’m building ReTraced to learn real reliability engineering.&lt;/p&gt;

&lt;p&gt;My goal is to make this &lt;strong&gt;production-level so that devs can actually use it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you have experience with schedulers, retries, DLQ design, or Redis-based coordination — I’d love your feedback, suggestions, and PRs 🙌&lt;/p&gt;

&lt;p&gt;Thanks for reading 🚀&lt;/p&gt;

</description>
      <category>distributedsystems</category>
      <category>typescript</category>
      <category>systemdesign</category>
      <category>redis</category>
    </item>
  </channel>
</rss>
