<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Behnam Amiri</title>
    <description>The latest articles on Forem by Behnam Amiri (@behnamaxo).</description>
    <link>https://forem.com/behnamaxo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3799265%2F3a021a91-d9bd-4192-a22e-49a6c020db5e.jpg</url>
      <title>Forem: Behnam Amiri</title>
      <link>https://forem.com/behnamaxo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/behnamaxo"/>
    <language>en</language>
    <item>
      <title>No, DriftQ Is Not Trying to Be Temporal</title>
      <dc:creator>Behnam Amiri</dc:creator>
      <pubDate>Thu, 12 Mar 2026 01:05:27 +0000</pubDate>
      <link>https://forem.com/behnamaxo/no-driftq-is-not-trying-to-be-temporal-2cl1</link>
      <guid>https://forem.com/behnamaxo/no-driftq-is-not-trying-to-be-temporal-2cl1</guid>
      <description>&lt;p&gt;I get this question a lot. Somebody sees DriftQ-Core for the first time, reads "durable broker" and "replayable workflows," and the first thing they ask is: "so how is this different from Temporal?"&lt;/p&gt;

&lt;p&gt;Totally fair question. Here's the honest answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Temporal is incredible. Use it if you need it.
&lt;/h2&gt;

&lt;p&gt;I'm not going to sit here and pretend DriftQ is some kind of competitor to Temporal. Temporal has been worked on for over a decade. It grew out of Uber's Cadence project. Thousands of companies run it in production. Stripe, Netflix, Datadog, real stuff at real scale. The team behind it has basically been thinking about durable execution longer than most of us have been thinking about distributed systems at all.&lt;/p&gt;

&lt;p&gt;If you need a full orchestration platform with deterministic workflow replay, multi-service scaling, rich message passing (signals, queries, updates), official SDKs in seven languages, namespace-based multi-tenancy, and a managed cloud option, that's Temporal. Go use it. Seriously.&lt;/p&gt;

&lt;p&gt;DriftQ is not that. And it's not trying to be.&lt;/p&gt;

&lt;h2&gt;
  
  
  So what is DriftQ actually trying to do?
&lt;/h2&gt;

&lt;p&gt;DriftQ started because I kept running into the same problem with AI and LLM pipelines. You've got these multi-step workflows, ingest, chunk, extract, summarize, classify, synthesize, and when something breaks near the end, the recovery plan in most setups is basically: run the whole thing again.&lt;/p&gt;

&lt;p&gt;In a normal backend, that's mostly friction: wasted engineer time, wasted compute, and more waiting around for work you've effectively already done. In AI pipelines, it becomes a direct cost problem. Every rerun means buying the same tokens you already paid for. And when you're iterating on prompts and the only thing that changed is the last step, rerunning five upstream LLM calls that haven't changed isn't just annoying, it's a fundamental cost leak.&lt;/p&gt;

&lt;p&gt;I wanted something that would give me the reliability basics, retries with backoff, dead-letter queue routing, idempotency, lease-based consumption, without asking me to deploy a cluster. And I wanted replay from a specific step so I could stop paying for work I'd already done.&lt;/p&gt;

&lt;p&gt;That's DriftQ. One binary. One process. No external dependencies. WAL-backed durability. A built-in dashboard. You can run it on a $5/month VPS, a local Docker host, or even a Raspberry Pi.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual differences, no fluff
&lt;/h2&gt;

&lt;p&gt;Here's how I think about it.&lt;/p&gt;

&lt;p&gt;Temporal is a &lt;strong&gt;durable execution platform&lt;/strong&gt;. You adopt its programming model. You write deterministic workflow code, separate your side effects into activities, deploy a multi-service cluster with a persistence database, and Temporal's event-sourced history drives everything forward. That's powerful. And it's a commitment. You're bound by the architecture.&lt;/p&gt;

&lt;p&gt;DriftQ is a &lt;strong&gt;reliability layer you drop in&lt;/strong&gt;. The stable v1 surface is a durable message broker: topics, consumer groups, streaming consumption over NDJSON, ack/nack with ownership, lease-based redelivery, retry policies carried with the message, strict DLQ routing, and consume-scope idempotency keys managed by the broker. The v2 layer adds replayable workflow foundations on top, an append-only run/event log, time-travel replay, and step-level artifact reuse.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If Temporal is "workflow engine first," DriftQ is "reliability plumbing first."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If Temporal is a durable distributed async call stack for business logic, DriftQ is a durable work-stream broker you can run as one process with retries, DLQ, and idempotency built in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Replay means something different when tokens cost money
&lt;/h2&gt;

&lt;p&gt;This is the part that matters most for the AI and agent use case.&lt;/p&gt;

&lt;p&gt;In Temporal, replay is about &lt;strong&gt;correctness&lt;/strong&gt;. The event history lets Temporal recreate workflow state after a crash so your code can keep going. That's essential for long-running business processes.&lt;/p&gt;

&lt;p&gt;In DriftQ, &lt;strong&gt;replay is also about cost&lt;/strong&gt;. If you have a six-step LLM pipeline and you're tweaking the final synthesis prompt, you don't want to rerun the five upstream summarization calls that haven't changed. Those calls cost real money: input tokens, output tokens, and per-call API fees.&lt;/p&gt;

&lt;p&gt;With DriftQ's v2 replay, you can re-drive from the step that actually changed and reuse the intermediate outputs from earlier steps. In &lt;a href="https://dev.to/behnamaxo/how-to-cut-llm-waste-with-driftq-4g4o"&gt;How to Cut LLM Waste with DriftQ&lt;/a&gt;, I walked through a simple example where that's roughly &lt;strong&gt;66% cheaper&lt;/strong&gt; for the exact same workflow on the exact same model. The model didn't get cheaper. The workflow just stopped buying the same intermediate work over and over.&lt;/p&gt;

&lt;p&gt;At scale with dozens of workflows, hundreds of runs per day, rapid prompt iterations, flaky providers, and retry storms, that waste compounds &lt;em&gt;fast&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What DriftQ is not
&lt;/h2&gt;

&lt;p&gt;I try to be upfront about this in the docs, and I'll say it here too.&lt;/p&gt;

&lt;p&gt;DriftQ-Core is not a distributed system. Not yet. It's a single process with file-based durability. If you need multi-node replication or horizontal scaling today, use Kafka and Temporal. I literally say that in the README.&lt;/p&gt;

&lt;p&gt;The v2 replayable workflow runtime is real and it works, but it's evolving. I call it "foundations" because that's what it is. The stable surface today is the v1 broker.&lt;/p&gt;

&lt;p&gt;The TypeScript SDK is stubbed. The Go and Python SDKs work for the v1 broker API. Governance, RBAC, and tenant isolation are on the roadmap, not shipped.&lt;/p&gt;

&lt;p&gt;DriftQ Cloud doesn't exist yet. It's planned.&lt;/p&gt;

&lt;p&gt;I'm one developer. Temporal has a company behind it with years of production hardening. That's just the reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to pick which
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Pick Temporal when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need a mature, battle-tested workflow orchestration platform.&lt;/li&gt;
&lt;li&gt;You need multi-service scaling for millions of concurrent workflows.&lt;/li&gt;
&lt;li&gt;You need official SDKs across many languages with full support.&lt;/li&gt;
&lt;li&gt;You need well-defined security, multi-tenancy, and a managed cloud option.&lt;/li&gt;
&lt;li&gt;You're building long-running stateful business processes where correctness under partial failure is the whole point.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pick DriftQ when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want a single-binary deployment with no external dependencies for durable work delivery.&lt;/li&gt;
&lt;li&gt;Your core need is safe stream consumption, leases, ack/nack, bounded retries, DLQ, idempotency, not a full workflow orchestration platform.&lt;/li&gt;
&lt;li&gt;You're building AI or agent pipelines and you care about not re-paying for upstream work when only a downstream step changed.&lt;/li&gt;
&lt;li&gt;You want step-level replay and time-travel debugging as a first-class thing, and you're okay with that being described as evolving v2 foundations.&lt;/li&gt;
&lt;li&gt;Simplicity matters more to you than multi-node HA right now.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;p&gt;Temporal is the planet-scale, enterprise-ready choice. If you need what it offers, nothing I'm building replaces that.&lt;/p&gt;

&lt;p&gt;DriftQ is the leaner, AI-ready solution. It's for growing teams and smaller projects that need real durability guarantees without the operational tax of distributed infrastructure. It's for AI workloads where re-execution has a dollar cost and replay from a specific step actually saves you money.&lt;/p&gt;

&lt;p&gt;When you outgrow it, that's fine. That's the &lt;a href="https://drift-q.org/docs/roadmap/" rel="noopener noreferrer"&gt;plan&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/driftq-org/DriftQ-Core" rel="noopener noreferrer"&gt;github.com/driftq-org/DriftQ-Core&lt;/a&gt;&lt;/p&gt;

</description>
      <category>driftq</category>
      <category>temporal</category>
      <category>go</category>
    </item>
    <item>
      <title>How to Cut LLM Waste with DriftQ</title>
      <dc:creator>Behnam Amiri</dc:creator>
      <pubDate>Mon, 02 Mar 2026 17:45:48 +0000</pubDate>
      <link>https://forem.com/behnamaxo/how-to-cut-llm-waste-with-driftq-4g4o</link>
      <guid>https://forem.com/behnamaxo/how-to-cut-llm-waste-with-driftq-4g4o</guid>
      <description>&lt;p&gt;I have been part of teams where we tried to cut LLM costs the obvious ways: using a cheaper model, trimming prompts, capping output tokens, adding caching, maybe routing smaller tasks to a cheaper tier. All of that helps. But a lot of avoidable spend in production isn't really about model pricing. It's workflow waste. Not the kind you notice immediately, either. The sneaky kind:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sometimes fails near the end, so the whole workflow has to return.&lt;/li&gt;
&lt;li&gt;A flaky provider causes retries that keep redoing the same paid work.&lt;/li&gt;
&lt;li&gt;A batch job pushes past safe concurrency and starts slamming the endpoint.&lt;/li&gt;
&lt;li&gt;A "self-healing" agent loop keeps spending in the background until somebody notices.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That wasted compute adds up fast. A lot of the time, you are not paying because the model is inherently too expensive. You are paying because your system keeps buying the same work over and over again. That is the layer DriftQ is meant to help with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DriftQ-Core&lt;/strong&gt; is an open-source Go project that gives you a durable broker plus replayable workflow runtime foundations in one package. If something fails late, you do not have to restart the whole workflow. If only one downstream step changed, you can replay from that step. If a dependency is flaky, retries are bounded. If concurrency goes sideways, there are controls for that too. DriftQ does not make tokens cheaper.&lt;/p&gt;

&lt;p&gt;What it can do is reduce avoidable spend caused by reruns, retries, and repeated execution of unchanged downstream work when prior outputs can be safely reused.&lt;/p&gt;

&lt;p&gt;If you are running anything more complex than a single prompt-response call - agents, multi-step chains, batch jobs, RAG pipelines, long-running workflows - that distinction matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  The expensive part usually is not the model. It is the rerun.
&lt;/h2&gt;

&lt;p&gt;Let's use a concrete example. Assume an example model priced at &lt;strong&gt;$1.75 per 1M input tokens&lt;/strong&gt; and &lt;strong&gt;$14.00 per 1M output tokens&lt;/strong&gt;. Now imagine a daily AI news workflow with six LLM steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Summarize article 1&lt;/li&gt;
&lt;li&gt;Summarize article 2&lt;/li&gt;
&lt;li&gt;Summarize article 3&lt;/li&gt;
&lt;li&gt;Summarize article 4&lt;/li&gt;
&lt;li&gt;Summarize article 5&lt;/li&gt;
&lt;li&gt;Write a final report from all five summaries&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each summary step uses about &lt;strong&gt;5,000 input tokens&lt;/strong&gt; and returns &lt;strong&gt;800 output tokens&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Per summary step:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input cost: &lt;code&gt;5,000 / 1,000,000 x $1.75 = $0.00875&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Output cost: &lt;code&gt;800 / 1,000,000 x $14.00 = $0.01120&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Total per summary: &lt;strong&gt;$0.01995&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Five summaries cost about &lt;strong&gt;$0.09975&lt;/strong&gt;. Now the final report step uses &lt;strong&gt;6,000 input tokens&lt;/strong&gt; and returns &lt;strong&gt;2,000 output tokens&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input cost: &lt;code&gt;6,000 / 1,000,000 x $1.75 = $0.01050&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Output cost: &lt;code&gt;2,000 / 1,000,000 x $14.00 = $0.02800&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Final step total: &lt;strong&gt;$0.03850&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So one clean run costs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;$0.09975 + $0.03850 = $0.13825&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That sounds cheap. And that is exactly why this kind of waste slips by people.&lt;/p&gt;

&lt;h2&gt;
  
  
  Without replay
&lt;/h2&gt;

&lt;p&gt;Let's say the final report is close, but not right. So you tweak the final prompt 10 times. In a naive workflow system, every tweak reruns all six steps. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10 reruns x &lt;strong&gt;$0.13825&lt;/strong&gt; = &lt;strong&gt;$1.38250&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;plus the original run = &lt;strong&gt;$1.52075&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nothing changed in the first five steps. You still paid for them 11 times.&lt;/p&gt;

&lt;h2&gt;
  
  
  With replay
&lt;/h2&gt;

&lt;p&gt;After the first run, DriftQ can keep the run/event history plus intermediate outputs from earlier steps.&lt;/p&gt;

&lt;p&gt;So if all you changed was the final synthesis prompt, and the upstream outputs are still valid, you replay from the final step instead of replaying the whole workflow.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10 replays x &lt;strong&gt;$0.03850&lt;/strong&gt; = &lt;strong&gt;$0.38500&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;plus the original run = &lt;strong&gt;$0.52325&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is about &lt;strong&gt;66% cheaper&lt;/strong&gt; for the exact same workflow and the exact same model. The model did not get cheaper. The workflow just stopped buying the same intermediate work over and over again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this gets ugly in the real world
&lt;/h2&gt;

&lt;p&gt;The example above is small on purpose. In real systems, the waste is usually worse. Think about the shape of a typical AI pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ingest&lt;/li&gt;
&lt;li&gt;chunk&lt;/li&gt;
&lt;li&gt;extract&lt;/li&gt;
&lt;li&gt;summarize&lt;/li&gt;
&lt;li&gt;classify&lt;/li&gt;
&lt;li&gt;synthesize&lt;/li&gt;
&lt;li&gt;write results somewhere&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now imagine the last step fails because of a formatting bug, a provider hiccup, a timeout, or a downstream API issue. In a lot of systems, the recovery plan is basically one thing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run it all again.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That means paying again for every upstream LLM call even though nothing new was learned. And once you multiply that by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dozens of workflows&lt;/li&gt;
&lt;li&gt;hundreds of runs per day&lt;/li&gt;
&lt;li&gt;repeated prompt iteration&lt;/li&gt;
&lt;li&gt;flaky external dependencies&lt;/li&gt;
&lt;li&gt;weak retry discipline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;the waste compounds fast. That is why replay from the failure point is not just a debugging convenience in AI systems. It changes the economics of failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What DriftQ actually gives you
&lt;/h2&gt;

&lt;p&gt;DriftQ is not trying to be magical. It gives you a set of practical primitives.&lt;/p&gt;

&lt;h3&gt;
  
  
  On the broker side
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;retries with backoff&lt;/li&gt;
&lt;li&gt;dead-letter queue routing&lt;/li&gt;
&lt;li&gt;idempotency keys&lt;/li&gt;
&lt;li&gt;consumer leases&lt;/li&gt;
&lt;li&gt;backpressure and max-inflight controls&lt;/li&gt;
&lt;li&gt;WAL-backed durability&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  On the workflow side
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;an append-only run/event log&lt;/li&gt;
&lt;li&gt;time-travel replay&lt;/li&gt;
&lt;li&gt;artifact storage and reuse&lt;/li&gt;
&lt;li&gt;budget controls for tokens, dollars, attempts, and wall-clock time&lt;/li&gt;
&lt;li&gt;inspectable timelines for debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination matters because LLM cost problems usually do not come from one mistake. They come from a chain reaction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retry logic is sloppy&lt;/li&gt;
&lt;li&gt;replay is missing&lt;/li&gt;
&lt;li&gt;concurrency is too loose&lt;/li&gt;
&lt;li&gt;debugging means rerunning everything&lt;/li&gt;
&lt;li&gt;budgets exist only in somebody's head&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DriftQ gives you concrete controls for those failure modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you want to see what the repo actually offers today, this is the fastest way to do it.&lt;/p&gt;

&lt;p&gt;Important: the built-in demo in this repo is a minimal two-step workflow with nodes &lt;code&gt;A -&amp;gt; B&lt;/code&gt;. It proves replay, timelines, and artifact reuse, but it is not the same thing as the six-step AI-news example above.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run DriftQ-Core
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 &lt;span class="nt"&gt;-v&lt;/span&gt; driftq_data:/data ghcr.io/driftq-org/driftq-core:1.2.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Confirm the server is up
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://127.0.0.1:8080/v1/healthz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Build the CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go build &lt;span class="nt"&gt;-o&lt;/span&gt; driftqctl ./cmd/driftqctl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Run the demo workflow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./driftqctl &lt;span class="nt"&gt;--base-url&lt;/span&gt; http://127.0.0.1:8080 runs demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy the &lt;code&gt;run_id&lt;/code&gt; from that command's output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Check run status
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./driftqctl &lt;span class="nt"&gt;--base-url&lt;/span&gt; http://127.0.0.1:8080 runs status &lt;span class="nt"&gt;--run-id&lt;/span&gt; &amp;lt;RUN_ID&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Inspect the timeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./driftqctl &lt;span class="nt"&gt;--base-url&lt;/span&gt; http://127.0.0.1:8080 runs timeline &lt;span class="nt"&gt;--run-id&lt;/span&gt; &amp;lt;RUN_ID&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Replay from the downstream demo step
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./driftqctl &lt;span class="nt"&gt;--base-url&lt;/span&gt; http://127.0.0.1:8080 runs replay &lt;span class="nt"&gt;--run-id&lt;/span&gt; &amp;lt;RUN_ID&amp;gt; &lt;span class="nt"&gt;--from-step&lt;/span&gt; B &lt;span class="nt"&gt;--mode&lt;/span&gt; time-travel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  View stored artifacts
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./driftqctl &lt;span class="nt"&gt;--base-url&lt;/span&gt; http://127.0.0.1:8080 runs artifacts &lt;span class="nt"&gt;--run-id&lt;/span&gt; &amp;lt;RUN_ID&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sequence is the point. In the built-in demo, replaying from &lt;code&gt;B&lt;/code&gt; lets you re-drive the downstream step without replaying &lt;code&gt;A&lt;/code&gt;. In a real AI workflow, the same idea applies to whatever your actual downstream node is called: reuse the outputs you already paid for and only re-execute the part that actually changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Replay is the headline, but it is not the only savings
&lt;/h2&gt;

&lt;p&gt;Replay is the easiest benefit to explain, but it is not the only place the bill goes down.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Retry storms stop turning outages into invoices
&lt;/h3&gt;

&lt;p&gt;When a dependency starts failing, bad retry logic can get expensive very quickly.&lt;/p&gt;

&lt;p&gt;A lot of systems basically do this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fail&lt;/li&gt;
&lt;li&gt;retry immediately&lt;/li&gt;
&lt;li&gt;fail again&lt;/li&gt;
&lt;li&gt;retry harder&lt;/li&gt;
&lt;li&gt;repeat until everybody is sad&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DriftQ gives you bounded attempts, backoff, and DLQ routing. So instead of "keep burning money until the provider recovers," you get "fail sanely, preserve state, and quarantine the bad work."&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Concurrency mistakes stop multiplying damage
&lt;/h3&gt;

&lt;p&gt;One of the fastest ways to waste money is to fan out too aggressively, hit rate limits, and then combine successful requests, failed requests, and retries into one giant mess.&lt;/p&gt;

&lt;p&gt;DriftQ's backpressure and max-inflight controls help stop overload from turning into paid chaos.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Runaway agents get budget fences
&lt;/h3&gt;

&lt;p&gt;If you have ever had an agent loop longer than expected, you know how ugly this can get.&lt;/p&gt;

&lt;p&gt;DriftQ includes budget controls for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;max attempts&lt;/li&gt;
&lt;li&gt;token budgets&lt;/li&gt;
&lt;li&gt;dollar budgets&lt;/li&gt;
&lt;li&gt;wall-clock time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So when something goes off the rails, the system can stop itself before your monthly bill makes the decision for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest caveats
&lt;/h2&gt;

&lt;p&gt;DriftQ is not a magic trick. It will &lt;strong&gt;not&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;shorten your prompts for you&lt;/li&gt;
&lt;li&gt;improve bad prompts automatically&lt;/li&gt;
&lt;li&gt;make a weak model smarter&lt;/li&gt;
&lt;li&gt;turn a bad agent design into a good one&lt;/li&gt;
&lt;li&gt;replace careful application design&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And it is also &lt;strong&gt;not&lt;/strong&gt; pretending to be Kafka at massive distributed scale. What it does is narrower, and for a lot of teams, more useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;durable execution&lt;/li&gt;
&lt;li&gt;replay&lt;/li&gt;
&lt;li&gt;artifact reuse&lt;/li&gt;
&lt;li&gt;disciplined retries&lt;/li&gt;
&lt;li&gt;budget controls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means fewer avoidable reruns, less accidental spend, and much better visibility into where your AI workflow is actually wasting money.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who should care about this right now
&lt;/h2&gt;

&lt;p&gt;If any of these sound familiar, this is probably worth a look:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Why did this workflow restart from step 1 again?"&lt;/li&gt;
&lt;li&gt;"Why are we calling the model when nothing changed?"&lt;/li&gt;
&lt;li&gt;"Why did the batch worker hammer the provider like that?"&lt;/li&gt;
&lt;li&gt;"Why did this agent keep spending money all night?"&lt;/li&gt;
&lt;li&gt;"Why does debugging this workflow cost money every single time?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly the class of problem DriftQ is built for.&lt;/p&gt;

&lt;p&gt;It is one Go binary with file-backed durability. No extra fleet, no giant platform tax, no stack of dependencies just to get reliable workflows, replay, retries, and control.&lt;/p&gt;

&lt;p&gt;For small teams, startups, and solo builders building AI systems, that tradeoff makes a lot of sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;I think a lot of teams are trying to cut LLM costs at the wrong layer. Yes, model pricing matters. But if your workflow keeps rerunning expensive work, handling retries badly, or letting agents wander without guardrails, you are going to burn money no matter how much you trim prompts. &lt;strong&gt;DriftQ&lt;/strong&gt; goes after that waste layer, and in a lot of real production systems, that's where the biggest savings are.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/driftq-org/DriftQ-Core" rel="noopener noreferrer"&gt;github.com/driftq-org/DriftQ-Core&lt;/a&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>driftq</category>
      <category>ai</category>
      <category>go</category>
    </item>
  </channel>
</rss>
