<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Akash Melavanki</title>
    <description>The latest articles on Forem by Akash Melavanki (@thsky21).</description>
    <link>https://forem.com/thsky21</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3866017%2Fbe85f26b-1253-41a7-a3f8-a912d5d47e89.png</url>
      <title>Forem: Akash Melavanki</title>
      <link>https://forem.com/thsky21</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/thsky21"/>
    <language>en</language>
    <item>
      <title>How I built a real-time LLM "Kill-Switch" for Vercel Edge using Atomic Redis</title>
      <dc:creator>Akash Melavanki</dc:creator>
      <pubDate>Tue, 07 Apr 2026 14:39:17 +0000</pubDate>
      <link>https://forem.com/thsky21/how-i-built-a-real-time-llm-kill-switch-for-vercel-edge-using-atomic-redis-3njm</link>
      <guid>https://forem.com/thsky21/how-i-built-a-real-time-llm-kill-switch-for-vercel-edge-using-atomic-redis-3njm</guid>
      <description>&lt;p&gt;Last week, the Axios supply chain attack compromised over 100 million weekly downloads. A week before that, it was LiteLLM.&lt;/p&gt;

&lt;p&gt;In both cases, the goal was simple: Exfiltrate API keys. As developers, we are taught to rotate our keys immediately. But there’s a massive gap in that advice. If an attacker gets your OpenAI key at 2 AM, they don't wait for you to wake up. They loop your endpoints, drain your credits, and leave you with a $1,000+ bill by sunrise.&lt;/p&gt;

&lt;p&gt;This is what OWASP calls LLM10:2025 – Unbounded Consumption (or "Denial of Wallet"). I spent the last two weeks building a way to stop it at the Edge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem: Why Rate Limiting Fails LLMs&lt;/strong&gt;&lt;br&gt;
Standard rate-limiting (e.g., 10 requests per minute) is useless for LLMs.&lt;/p&gt;

&lt;p&gt;Request 1: "Hi" (10 tokens) — Cost: $0.0001&lt;/p&gt;

&lt;p&gt;Request 2: "Summarize this 50-page PDF" (30,000 tokens) — Cost: $0.45&lt;/p&gt;

&lt;p&gt;An attacker doesn't need a high volume of requests to ruin you; they just need expensive requests. We need Budget Limiting, not Rate Limiting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Technical Challenge: The Stateless Race Condition&lt;/strong&gt;&lt;br&gt;
I’m building this for Next.js on Vercel Edge.&lt;/p&gt;

&lt;p&gt;Vercel Edge functions are stateless. If you try to track a user's spend in a local variable, it vanishes. If you use a standard database, the latency kills your UX.&lt;/p&gt;

&lt;p&gt;But the real "final boss" is the Race Condition.&lt;/p&gt;

&lt;p&gt;Imagine a user fires 10 concurrent requests.&lt;/p&gt;

&lt;p&gt;Instance A checks the budget: "Remaining: $0.05. Proceed."&lt;/p&gt;

&lt;p&gt;Instance B checks the budget: "Remaining: $0.05. Proceed."&lt;/p&gt;

&lt;p&gt;Both fire $1.00 requests.&lt;/p&gt;

&lt;p&gt;Result: You are now -$1.95 in the hole.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution: Atomic Lua Scripts on Redis&lt;/strong&gt;&lt;br&gt;
To solve this, I moved the logic into an Atomic Lua Script on Upstash Redis. Instead of "Check then Update" (two steps), the logic happens in one single, uninterruptible step inside the database memory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- The "Kill-Switch" Logic&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;KEYS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;-- user_budget_key&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="c1"&gt;-- e.g., 1.00 USD&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="c1"&gt;-- estimated cost&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'GET'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="c1"&gt;-- BLOCK&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'INCRBYFLOAT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="c1"&gt;-- ALLOW&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs in ~10ms. If Instance A and B hit the script at the exact same millisecond, Redis queues them. One passes, the second fails. No race condition. No $1,000 surprises.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Benchmark: A Controlled Stress Test&lt;/strong&gt;&lt;br&gt;
To quantify the risk, I ran a simulated Denial of Wallet (DWL) attack against a standard Next.js API route.&lt;/p&gt;

&lt;p&gt;The Setup:&lt;/p&gt;

&lt;p&gt;Attacker: A simple recursive script firing concurrent requests with high-token payloads (800+ tokens/request).&lt;/p&gt;

&lt;p&gt;Target: A GPT-4o endpoint.&lt;/p&gt;

&lt;p&gt;The Result (Unprotected): The script ran for 47 seconds. Total simulated cost reached $847.00 before manual intervention.&lt;/p&gt;

&lt;p&gt;The Result (Thskyshield): Using the same script, the governance layer triggered a 429 (Too Many Requests) at the 3rd call. Total spend: $0.08.&lt;/p&gt;

&lt;p&gt;Watch the Live Simulation →&lt;a href="https://www.thskyshield.com/simulator" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "Two-Phase" Protocol&lt;/strong&gt;&lt;br&gt;
The hardest part was handling the fact that you don't know the exact cost of an LLM call until it's finished. I settled on a two-phase approach:&lt;/p&gt;

&lt;p&gt;Phase 1 (Pre-flight): Check the budget based on the max possible tokens. "Lock" that amount.&lt;/p&gt;

&lt;p&gt;Phase 2 (Post-flight): Once the LLM returns, reconcile the actual usage and "Refund" the difference to the user's budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
Supply chain attacks like the Axios one are the "new normal." We can't stop every key from being stolen, but we can stop a stolen key from being a business-ending event.&lt;/p&gt;

&lt;p&gt;I’ve open-sourced the SDK for this under Thskyshield. If you're building with Next.js and want to stop worrying about your OpenAI bill, it's free for founders.&lt;/p&gt;

&lt;p&gt;SDK: @thsky-21/thskyshield &lt;a href="https://www.npmjs.com/package/@thsky-21/thskyshield" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Website: thskyshield.com &lt;a href="https://www.thskyshield.com/" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Would love to hear how others are handling "Denial of Wallet" risks. Are you just relying on OpenAI's hard limits, or are you building your own governance layer?&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>ai</category>
      <category>security</category>
      <category>api</category>
    </item>
  </channel>
</rss>
