<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Devansh</title>
    <description>The latest articles on Forem by Devansh (@devansh365).</description>
    <link>https://forem.com/devansh365</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F679755%2F9dc6ebfe-a1d9-4613-8192-f2854324ea75.png</url>
      <title>Forem: Devansh</title>
      <link>https://forem.com/devansh365</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/devansh365"/>
    <language>en</language>
    <item>
      <title>LiteLLM got hacked. I built a simpler LLM gateway you can actually audit.</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Tue, 14 Apr 2026 00:10:45 +0000</pubDate>
      <link>https://forem.com/devansh365/litellm-got-hacked-i-built-a-simpler-llm-gateway-you-can-actually-audit-3hia</link>
      <guid>https://forem.com/devansh365/litellm-got-hacked-i-built-a-simpler-llm-gateway-you-can-actually-audit-3hia</guid>
      <description>&lt;p&gt;On March 24, 2026, LiteLLM versions 1.82.7 and 1.82.8 were uploaded to PyPI with a credential harvester, a Kubernetes lateral-movement toolkit, and a persistent remote code execution backdoor baked in.&lt;/p&gt;

&lt;p&gt;The malicious package was live for about 40 minutes before PyPI quarantined it.&lt;/p&gt;

&lt;p&gt;40 minutes doesn't sound like much. But LiteLLM gets 95 million downloads a month. It's the default multi-provider routing library for anyone building on LLMs. Teams running &lt;code&gt;pip install litellm&lt;/code&gt; during that window got compromised automatically. No explicit import needed. The payload triggered on Python interpreter startup via a &lt;code&gt;.pth&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;Google brought in Mandiant for the investigation. Snyk, Kaspersky, and Trend Micro all published breakdowns. The attack vector: a compromised Trivy security scanner leaked CircleCI credentials, including the PyPI publishing token and a GitHub PAT.&lt;/p&gt;

&lt;p&gt;This is not a theoretical risk. This happened.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn61cm58mhjeckf0ttnf8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn61cm58mhjeckf0ttnf8.png" alt="LiteLLM Attack Timeline" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem is not one attack
&lt;/h2&gt;

&lt;p&gt;LiteLLM does a lot. 2,000+ models across 100+ providers. Proxy server, load balancing, spend tracking, A/B testing, caching, logging, guardrails, prompt management.&lt;/p&gt;

&lt;p&gt;That scope is the problem.&lt;/p&gt;

&lt;p&gt;A developer on HN described the codebase as having a 7,000+ line &lt;code&gt;utils.py&lt;/code&gt;. A 30-year engineer called it "the worst code I have ever read in my life." Before the supply chain attack, a DEV Community post titled "5 Real Issues With LiteLLM That Are Pushing Teams Away in 2026" was already documenting the trust erosion.&lt;/p&gt;

&lt;p&gt;The supply chain attack was the tipping point, not the root cause. The root cause is depending on a massive, opaque library for critical routing infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a simpler design looks like
&lt;/h2&gt;

&lt;p&gt;I ran into the same multi-provider routing problem last year while building Metis, an AI stock analysis tool. Kept burning through Groq's free tier in 20 minutes, switching to Gemini manually, hitting their cap, switching again.&lt;/p&gt;

&lt;p&gt;Built FreeLLM to stop doing that manually. It solves a narrower problem than LiteLLM, and that's the point.&lt;/p&gt;

&lt;p&gt;FreeLLM is an OpenAI-compatible gateway that routes across Groq, Gemini, Mistral, Cerebras, NVIDIA NIM, and Ollama. When one provider rate-limits, the next one answers. That's the core of it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqv5zmkmnwy550olr0tz4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqv5zmkmnwy550olr0tz4.png" alt="LiteLLM vs FreeLLM" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What it does
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:3000/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model": "free-fast", "messages": [{"role": "user", "content": "Hello!"}]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your existing OpenAI SDK code works. Swap the base URL. Keep your code.&lt;/p&gt;

&lt;p&gt;Three meta-models handle routing: &lt;code&gt;free-fast&lt;/code&gt; (lowest latency, usually Groq/Cerebras), &lt;code&gt;free-smart&lt;/code&gt; (best reasoning, usually Gemini 2.5 Pro), and &lt;code&gt;free&lt;/code&gt; (max availability).&lt;/p&gt;

&lt;h3&gt;
  
  
  What it fixes that LiteLLM doesn't
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Gemini 2.5 reasoning tokens eating your output.&lt;/strong&gt; This is one of the most reported Gemini bugs right now. Gemini 2.5 Flash and Pro are reasoning models. They burn 90-98% of your &lt;code&gt;max_tokens&lt;/code&gt; on internal thinking before producing visible text. Ask for 1,000 tokens and you get back 37. There are 15+ open GitHub issues about this across multiple SDKs.&lt;/p&gt;

&lt;p&gt;FreeLLM fixes it at the gateway. Flash gets &lt;code&gt;reasoning_effort: "none"&lt;/code&gt; by default. Pro gets &lt;code&gt;"low"&lt;/code&gt;. Your full token budget goes to the actual answer. Override per-request if you want the reasoning back.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Provider outages don't break your app.&lt;/strong&gt; Claude went down for three consecutive days in early April. 8,000+ Downdetector reports. If your app depends on one provider, that's three days of broken service. FreeLLM's circuit breakers pull failing providers from rotation and test for recovery automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Response caching without a separate layer.&lt;/strong&gt; Identical prompts return in ~23ms with zero quota burn. The cache refuses to store truncated responses (another Gemini bug: reasoning models returning cut-off output that then poisons your cache for an hour).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser-safe tokens for static sites.&lt;/strong&gt; Mint a short-lived HMAC-signed token from a serverless function, pass it to the browser, call the gateway directly from client-side JavaScript. No auth backend. No session store.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key stacking: 360 free requests per minute
&lt;/h3&gt;

&lt;p&gt;Every provider env var accepts a comma-separated list. FreeLLM rotates round-robin per key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GROQ_API_KEY=gsk_key1,gsk_key2,gsk_key3
GEMINI_API_KEY=AI_key1,AI_key2,AI_key3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stack 3 keys across 5 cloud providers: ~360 req/min. All free. Enough to prototype an entire product without spending anything.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwrn8oxy5n9bvgvnpbjze.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwrn8oxy5n9bvgvnpbjze.png" alt="Key Stacking Math" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Get it running
&lt;/h2&gt;

&lt;p&gt;Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:3000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;GROQ_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gsk_... &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;AI... &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/devansh-365/freellm:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or one-click deploy on Railway or Render (buttons in the README).&lt;/p&gt;

&lt;p&gt;Use it from Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:3000/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unused&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;free-smart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain circuit breakers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TypeScript, Go, Ruby, anything that speaks OpenAI. Same pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters beyond FreeLLM
&lt;/h2&gt;

&lt;p&gt;The LiteLLM attack exposed something the community already suspected: critical AI infrastructure is running on libraries nobody audits.&lt;/p&gt;

&lt;p&gt;The fix is not "use my tool instead." The fix is smaller dependencies, pinned versions, codebases you can read in an afternoon. FreeLLM is 262 tests across 22 files. TypeScript, not Python. Docker images with pinned deps. MIT licensed.&lt;/p&gt;

&lt;p&gt;If you don't use FreeLLM, build something similarly scoped. The era of "install this 100-provider mega-library and trust it with your API keys" should be over.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsl47ise5n7ao6pfjztu3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsl47ise5n7ao6pfjztu3.png" alt="Request Flow" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;262 tests. 6 providers. One endpoint. Zero cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Devansh-365/freellm" rel="noopener noreferrer"&gt;github.com/devansh-365/freellm&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
      <category>openai</category>
    </item>
    <item>
      <title>I built an OpenAI-compatible gateway that routes across 5 free LLM providers</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Mon, 06 Apr 2026 20:22:07 +0000</pubDate>
      <link>https://forem.com/devansh365/i-built-an-openai-compatible-gateway-that-routes-across-5-free-llm-providers-6jo</link>
      <guid>https://forem.com/devansh365/i-built-an-openai-compatible-gateway-that-routes-across-5-free-llm-providers-6jo</guid>
      <description>&lt;p&gt;Every LLM provider has a free tier.&lt;/p&gt;

&lt;p&gt;Groq gives you 30 requests per minute. Gemini gives you 15. Cerebras gives you 30. Mistral gives you 5.&lt;/p&gt;

&lt;p&gt;Combined, that's about 80 requests per minute. Enough for prototyping, internal tools, and side projects where you don't want to pay for API access yet.&lt;/p&gt;

&lt;p&gt;The problem: each provider has its own SDK, its own rate limits, its own auth, and its own downtime. You end up writing provider-switching logic, catching 429 errors, and managing API keys across five different dashboards.&lt;/p&gt;

&lt;p&gt;I got tired of this while building &lt;a href="https://trymetis.app" rel="noopener noreferrer"&gt;Metis&lt;/a&gt;, an AI stock analysis tool. Kept hitting Groq's limits while Gemini had capacity sitting idle. So I built FreeLLM.&lt;/p&gt;

&lt;h2&gt;
  
  
  What FreeLLM does
&lt;/h2&gt;

&lt;p&gt;One endpoint. Five providers. Twenty models. All free.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:3000/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model": "free-fast", "messages": [{"role": "user", "content": "Hello!"}]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your existing OpenAI SDK code works. Just change the base URL. That's the whole migration.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the routing works
&lt;/h2&gt;

&lt;p&gt;When a request comes in, FreeLLM:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Checks which providers are healthy (circuit breakers track this automatically)&lt;/li&gt;
&lt;li&gt;Picks the best available provider based on your model choice&lt;/li&gt;
&lt;li&gt;If that provider returns a 429 or fails, it tries the next one&lt;/li&gt;
&lt;li&gt;You get a response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three meta-models handle routing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;free-fast   → lowest latency (usually Groq or Cerebras)
free-smart  → most capable model (usually Gemini 2.5)
free        → maximum availability across all providers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Providers and their free tiers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Models&lt;/th&gt;
&lt;th&gt;Free Tier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Groq&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B, Llama 4 Scout, Qwen3 32B&lt;/td&gt;
&lt;td&gt;~30 req/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;2.5 Flash, 2.5 Pro, 2.0 Flash&lt;/td&gt;
&lt;td&gt;~15 req/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cerebras&lt;/td&gt;
&lt;td&gt;Llama 3.1 8B, Qwen3 235B, GPT-OSS 120B&lt;/td&gt;
&lt;td&gt;~30 req/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral&lt;/td&gt;
&lt;td&gt;Small, Medium, Nemo&lt;/td&gt;
&lt;td&gt;~5 req/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;td&gt;Any local model&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdwx65ruk33vv4zqkl0q5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdwx65ruk33vv4zqkl0q5.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's under the hood
&lt;/h2&gt;

&lt;p&gt;This isn't a simple round-robin proxy. The routing layer handles real production concerns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sliding-window rate limiter.&lt;/strong&gt; Each provider's limits are tracked independently. FreeLLM knows how many requests you've sent to Groq in the last 60 seconds and won't send another if you're near the cap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Circuit breakers.&lt;/strong&gt; If Gemini starts returning 500s, FreeLLM pulls it from rotation. Every 30 seconds, it sends a test request. When the provider recovers, it goes back in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-client rate limiting.&lt;/strong&gt; If you expose this to a team, each client gets their own limit. Admin auth protects the config endpoints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zod validation.&lt;/strong&gt; Every request is validated before it hits any provider. Bad payloads fail fast with clear error messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-time dashboard.&lt;/strong&gt; React frontend showing provider health, request logs, and latency. You can see which providers are healthy at a glance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get it running in 30 seconds
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/devansh-365/freellm.git
&lt;span class="nb"&gt;cd &lt;/span&gt;freellm
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env   &lt;span class="c"&gt;# add your free API keys&lt;/span&gt;
docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;API on &lt;code&gt;localhost:3000&lt;/code&gt;. Dashboard on &lt;code&gt;localhost:3000/dashboard&lt;/code&gt;. Done.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using it with the OpenAI SDK
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://localhost:3000/v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;not-needed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;free-fast&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain circuit breakers in 2 sentences&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No new SDK to learn. No migration effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built this
&lt;/h2&gt;

&lt;p&gt;I was building Metis and kept running into the same pattern: burn through Groq's free tier in 20 minutes of testing, switch to Gemini manually, hit their limit, switch to Mistral. Repeat.&lt;/p&gt;

&lt;p&gt;Wrote a quick proxy to automate the switching. Added failover because providers go down randomly. Added circuit breakers because I didn't want to wait for timeouts. Added a dashboard because I wanted to see what was happening.&lt;/p&gt;

&lt;p&gt;It grew into a proper tool. Open-sourced it because every developer prototyping with LLMs has this exact problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;p&gt;TypeScript, Express 5, React 19, Zod, Docker. MIT licensed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/devansh-365/freellm" rel="noopener noreferrer"&gt;github.com/devansh-365/freellm&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>typescript</category>
      <category>opensource</category>
    </item>
    <item>
      <title>react native animation</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Wed, 10 Sep 2025 16:11:18 +0000</pubDate>
      <link>https://forem.com/devansh365/react-native-animation-46f2</link>
      <guid>https://forem.com/devansh365/react-native-animation-46f2</guid>
      <description></description>
      <category>reactnative</category>
      <category>react</category>
      <category>animation</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
