<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Pranav Chandra</title>
    <description>The latest articles on Forem by Pranav Chandra (@pcpranav).</description>
    <link>https://forem.com/pcpranav</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F583635%2F68bbbbb9-3946-4268-8879-34f132926d91.jpg</url>
      <title>Forem: Pranav Chandra</title>
      <link>https://forem.com/pcpranav</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/pcpranav"/>
    <language>en</language>
    <item>
      <title>I tested 4 free 70B-class LLM endpoints for real production work — here's what each is actually good at</title>
      <dc:creator>Pranav Chandra</dc:creator>
      <pubDate>Sat, 02 May 2026 09:35:16 +0000</pubDate>
      <link>https://forem.com/pcpranav/i-tested-4-free-70b-class-llm-endpoints-for-real-production-work-heres-what-each-is-actually-1if9</link>
      <guid>https://forem.com/pcpranav/i-tested-4-free-70b-class-llm-endpoints-for-real-production-work-heres-what-each-is-actually-1if9</guid>
      <description>&lt;h2&gt;
  
  
  The question
&lt;/h2&gt;

&lt;p&gt;Most "production-grade" AI tools ship on paid endpoints — OpenAI, Anthropic, Gemini Pro. That's the safe choice. It's also the expensive one.&lt;/p&gt;

&lt;p&gt;I wanted to know: in mid-2026, can free 70B-class open-source endpoints actually carry a real product workload? Not a toy chatbot — a tool that generates working HTML/CSS/JS for arbitrary user prompts.&lt;/p&gt;

&lt;p&gt;So I built one. The result is &lt;strong&gt;Sitecraft&lt;/strong&gt; (&lt;a href="https://wiz-craft.vercel.app/" rel="noopener noreferrer"&gt;wiz-craft.vercel.app&lt;/a&gt;) — a free, open-source AI website builder. It runs across &lt;strong&gt;four different free endpoints&lt;/strong&gt; and lets users switch between them mid-conversation.&lt;/p&gt;

&lt;p&gt;This post is what I learned about each one. Not a benchmark — a working engineer's notes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 4 endpoints I shipped
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Why it's on the list&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cerebras&lt;/td&gt;
&lt;td&gt;Qwen 3 235B&lt;/td&gt;
&lt;td&gt;Reasoning depth — strongest at "think before generating"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Groq&lt;/td&gt;
&lt;td&gt;Llama 4 Scout&lt;/td&gt;
&lt;td&gt;Throughput king — fastest token rate I've measured&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenRouter&lt;/td&gt;
&lt;td&gt;Ling-2.6 Flash&lt;/td&gt;
&lt;td&gt;Generalist; best fallback when the others rate-limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare&lt;/td&gt;
&lt;td&gt;GPT-OSS 120B&lt;/td&gt;
&lt;td&gt;Edge inference; lowest latency from a Workers backend&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All four have free tiers that are actually usable for shipping (not just "free for 100 tokens then $50/M"). Different free-tier shapes, but real free.&lt;/p&gt;

&lt;h2&gt;
  
  
  What each one is actually good at
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Cerebras · Qwen 3 235B — the reasoner
&lt;/h3&gt;

&lt;p&gt;When the prompt requires &lt;em&gt;planning&lt;/em&gt; (e.g. "build a 3-page site with consistent design language across all pages, plus a working contact form"), Qwen on Cerebras consistently produces the most coherent output. It thinks about the whole problem before emitting code.&lt;/p&gt;

&lt;p&gt;Trade-off: it's slower per token than Groq, and the free tier rate-limits aggressively when you go beyond a few requests per minute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt; The prompt is open-ended and the model needs to invent structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Groq · Llama 4 Scout — the speed demon
&lt;/h3&gt;

&lt;p&gt;Groq's LPUs are the fastest inference I've ever benchmarked. Llama 4 Scout hits ~500+ tokens/second sustained, which means a full single-page site (~3000-5000 tokens) lands in under 10 seconds — feels instant in the UI.&lt;/p&gt;

&lt;p&gt;Trade-off: Llama 4 Scout is smaller and less "thoughtful" than Qwen 3 235B. For complex prompts it sometimes generates plausible but incorrect code (wrong CSS selectors, hallucinated APIs).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt; Iteration speed matters more than first-shot correctness.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenRouter · Ling-2.6 Flash — the generalist fallback
&lt;/h3&gt;

&lt;p&gt;OpenRouter's free Ling-2.6 Flash isn't the best at any single thing, but it's &lt;em&gt;consistently okay&lt;/em&gt; across prompt types and almost never rate-limits. That makes it the perfect fallback target when one of the other three is throttled.&lt;/p&gt;

&lt;p&gt;Trade-off: code output quality is noticeably lower than Qwen — more boilerplate, less elegant HTML structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt; You need a fallback that won't fail. Paired with quality-checks downstream, it's the safety net.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloudflare · GPT-OSS 120B — the edge play
&lt;/h3&gt;

&lt;p&gt;If your backend already runs on Cloudflare Workers, calling Cloudflare AI from the same edge node is hands-down the lowest-latency setup. No cross-region hop, no cold-start penalty.&lt;/p&gt;

&lt;p&gt;Trade-off: GPT-OSS 120B is older architecture than the others. Output quality sits between Llama 4 Scout and Ling — fine for short outputs, weaker for long-form generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt; Your stack is already on Cloudflare and you want to keep the inference call inside the edge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The orchestration strategy
&lt;/h2&gt;

&lt;p&gt;You don't pick one. You orchestrate.&lt;/p&gt;

&lt;p&gt;For Sitecraft I shipped a simple router:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Default:&lt;/strong&gt; Groq Llama 4 (fastest perceived response)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If user explicitly toggles "high-quality":&lt;/strong&gt; Cerebras Qwen 3 235B&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If primary is rate-limited (429):&lt;/strong&gt; fall back to OpenRouter Ling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare GPT-OSS:&lt;/strong&gt; offered as a manual switch for users on slow connections (edge-routed)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The user can also switch mid-conversation — useful when, say, Qwen produced a great structural draft but you want Groq to iterate fast on the styling.&lt;/p&gt;

&lt;p&gt;The whole router is ~80 lines of plain JS. No LangChain, no framework. Just an &lt;code&gt;if/else&lt;/code&gt; over which endpoint to hit, plus a small wrapper that normalises response formats (each provider returns slightly different envelope JSON).&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest trade-offs
&lt;/h2&gt;

&lt;p&gt;If you're considering this approach, here's what I wish someone had told me upfront:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Response format quirks.&lt;/strong&gt; Every provider returns slightly different JSON. Cerebras gives &lt;code&gt;choices[0].message.content&lt;/code&gt;, Cloudflare gives &lt;code&gt;result.response&lt;/code&gt;, OpenRouter mostly mirrors OpenAI but with quirks on streaming. &lt;strong&gt;Build a normaliser early&lt;/strong&gt; or you'll regret it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Free tiers are real but bounded.&lt;/strong&gt; "Free" usually means rate-limited per minute and per day. For a side project with low traffic, you'll never notice. For a viral hit, you'll need a paid tier or a queue.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Quality variance is real.&lt;/strong&gt; Even within the same prompt, Qwen and Llama 4 will produce noticeably different code. If your UX expects consistency, normalise via post-processing or pick one provider per request type.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No streaming compatibility guarantees.&lt;/strong&gt; SSE format varies subtly. If you stream tokens to a UI, expect to write provider-specific stream parsers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Latency varies wildly by region.&lt;/strong&gt; Groq is fast from US-East. From APAC, the round-trip dominates. Cloudflare's edge play wins here.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When this approach is right (and when it isn't)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use free open-source endpoints if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're shipping a side project / proof-of-concept and don't want a credit-card dependency&lt;/li&gt;
&lt;li&gt;Your users are okay with "good enough" code (and you can iterate)&lt;/li&gt;
&lt;li&gt;You want flexibility to swap models without lock-in&lt;/li&gt;
&lt;li&gt;You're learning about LLM orchestration and want hands-on experience with the trade-offs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Stick with paid (Claude / GPT-5 / Gemini Pro) if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need single-shot correctness (no iteration loop)&lt;/li&gt;
&lt;li&gt;You're building agentic workflows with deep reasoning (multi-tool, long context)&lt;/li&gt;
&lt;li&gt;You have a real budget and uptime SLA matters&lt;/li&gt;
&lt;li&gt;Your prompts genuinely need 200K+ context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For Sitecraft, the 4-endpoint orchestration was the right call — generating a website is iterative anyway, the user is in the loop, and "free forever" is part of the product pitch.&lt;/p&gt;

&lt;p&gt;For my other side project, &lt;a href="https://promptcraft-io.vercel.app/" rel="noopener noreferrer"&gt;App Architect&lt;/a&gt; (a 5-phase design workflow that turns app ideas into TDD prompts), I went the opposite direction and built it on Claude Artifacts — because &lt;em&gt;that&lt;/em&gt; tool needs deep reasoning over a long structured conversation, and the paid frontier model is worth it.&lt;/p&gt;

&lt;p&gt;Right tool for the right job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;→ &lt;strong&gt;&lt;a href="https://wiz-craft.vercel.app/" rel="noopener noreferrer"&gt;wiz-craft.vercel.app&lt;/a&gt;&lt;/strong&gt; — switch between the 4 endpoints mid-conversation and feel the difference for yourself.&lt;/p&gt;

&lt;p&gt;Source: &lt;strong&gt;&lt;a href="https://github.com/pcpranav/sitecraft" rel="noopener noreferrer"&gt;github.com/pcpranav/sitecraft&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you've shipped on free open-source endpoints, I'd love to hear which providers you settled on and why. Drop a comment.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I built a 5-phase design system that turns "I have an app idea" into a TDD prompt</title>
      <dc:creator>Pranav Chandra</dc:creator>
      <pubDate>Fri, 01 May 2026 21:14:27 +0000</pubDate>
      <link>https://forem.com/pcpranav/i-built-a-5-phase-design-system-that-turns-i-have-an-app-idea-into-a-tdd-prompt-3do</link>
      <guid>https://forem.com/pcpranav/i-built-a-5-phase-design-system-that-turns-i-have-an-app-idea-into-a-tdd-prompt-3do</guid>
      <description>&lt;p&gt;App Architect walks you from a one-line idea to a production-ready prompt — user flow, page map, system design, spec, and tests. Here's why I built it and how it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with "build me an app" prompts
&lt;/h2&gt;

&lt;p&gt;Every developer has done this at least once: you have a vague idea, you open Claude or ChatGPT, you type &lt;em&gt;"build me a habit tracker with social features"&lt;/em&gt;, and forty seconds later you're staring at 600 lines of code that compiles but doesn't actually solve the problem you had in your head.&lt;/p&gt;

&lt;p&gt;The model isn't wrong. Your prompt is.&lt;/p&gt;

&lt;p&gt;There's a missing layer between &lt;strong&gt;"I have an idea"&lt;/strong&gt; and &lt;strong&gt;"here is the code"&lt;/strong&gt; — the layer where a real engineering team would normally spend a week: user flows, page maps, schemas, edge cases, the boring stuff that decides whether the thing actually ships.&lt;/p&gt;

&lt;p&gt;I kept skipping that layer. So I built a tool that refuses to let me skip it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What App Architect does
&lt;/h2&gt;

&lt;p&gt;App Architect is a Claude artifact that takes you through five structured phases. You describe your idea in one sentence; it asks the right follow-up questions; at the end you get a single prompt you can paste into Claude Code (or any capable LLM) to actually build the thing.&lt;/p&gt;

&lt;p&gt;The five phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;App Flow&lt;/strong&gt; — user journey, tech stack, integrations, edge cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Page Map&lt;/strong&gt; — routes, wireframes, layout zones, key interactions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System Design&lt;/strong&gt; — DB schema, API routes, architecture diagram&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Executive Summary&lt;/strong&gt; — full spec: features, stack, risks, open questions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TDD Prompt&lt;/strong&gt; — a production-ready prompt with unit tests, E2E tests, and CI config baked in&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The output of phase 5 is the artifact. Everything before it exists to make that final prompt non-garbage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why TDD as the final output
&lt;/h2&gt;

&lt;p&gt;Because the failure mode of AI-generated code isn't "it doesn't compile." It's "it compiles, looks reasonable, and silently does the wrong thing in three places."&lt;/p&gt;

&lt;p&gt;If the final prompt asks the LLM to write tests &lt;strong&gt;first&lt;/strong&gt;, then implementation, you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A clear, executable definition of "done"&lt;/li&gt;
&lt;li&gt;A regression net the moment something else changes&lt;/li&gt;
&lt;li&gt;A way to spot hallucinated APIs early (the test fails before you ship)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't need to be a TDD purist to want that.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use it
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Open the &lt;a href="https://promptcraft-io.vercel.app" rel="noopener noreferrer"&gt;App Architect&lt;/a&gt; landing page&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Launch&lt;/strong&gt; — it opens as a Claude artifact in your own Claude account (free tier works fine)&lt;/li&gt;
&lt;li&gt;Describe your idea in one or two sentences&lt;/li&gt;
&lt;li&gt;Answer the phase prompts honestly — "I don't know yet" is a valid answer and the tool handles it&lt;/li&gt;
&lt;li&gt;Copy the final TDD prompt and paste it into Claude Code, Cursor, or any agentic dev tool&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There's no sign-up on my side. No data collection. Usage runs on your Claude account, billed to you. I don't see any of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;A few things I'd change if I rebuilt it tomorrow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Persist drafts.&lt;/strong&gt; Right now if you close the tab mid-flow you start over. Local storage would fix it cheaply.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Branching flows.&lt;/strong&gt; Some apps don't have a "page map" (CLIs, MCP servers, libraries). The current flow nudges everything toward web apps. A branch at phase 1 would help.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stack templates.&lt;/strong&gt; A "Next.js + Postgres + Drizzle" preset would skip three rounds of follow-up questions for the 60% of users who already know their stack.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'll get there. For now, the v1 covers the painful part: stopping you from prompting before you've thought.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;→ &lt;strong&gt;&lt;a href="https://promptcraft-io.vercel.app" rel="noopener noreferrer"&gt;promptcraft-io.vercel.app&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you build something with it, I'd genuinely love to see what comes out the other side. Drop a comment.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
