<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Pierre</title>
    <description>The latest articles on Forem by Pierre (@perror44).</description>
    <link>https://forem.com/perror44</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3095844%2F82a6b71d-6f5e-4886-8434-4e13c6e95e08.png</url>
      <title>Forem: Pierre</title>
      <link>https://forem.com/perror44</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/perror44"/>
    <language>en</language>
    <item>
      <title>How I built an AI podcast generator that turns any content into audio conversations</title>
      <dc:creator>Pierre</dc:creator>
      <pubDate>Wed, 06 May 2026 08:15:52 +0000</pubDate>
      <link>https://forem.com/perror44/how-i-built-an-ai-podcast-generator-that-turns-any-content-into-audio-conversations-276p</link>
      <guid>https://forem.com/perror44/how-i-built-an-ai-podcast-generator-that-turns-any-content-into-audio-conversations-276p</guid>
      <description>&lt;p&gt;I read too much. PDFs, newsletters, long articles - my reading list is a graveyard of good intentions. At some point I stopped fighting it and just built a tool to listen to it all instead.&lt;/p&gt;

&lt;p&gt;That's Podcastify: paste a URL, upload a PDF, drop some text, and get back a podcast-style audio conversation between two AI hosts discussing your content. Six weeks after launching, we have 3 paying subscribers. Not a hockey stick, but real people handing over real money for a thing I shipped. Here's how it works under the hood.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;The core loop is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You submit any content - a URL, a PDF, raw text, or an image&lt;/li&gt;
&lt;li&gt;Gemini reads it and writes a Q&amp;amp;A-style conversation between two hosts&lt;/li&gt;
&lt;li&gt;A TTS provider converts that transcript to audio, per speaker&lt;/li&gt;
&lt;li&gt;The segments are merged into a single MP3, stored, and served back to you&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The output feels like a podcast episode where two people actually discuss the content, not just read it aloud.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture: why two phases
&lt;/h2&gt;

&lt;p&gt;The pipeline is split into two distinct phases, and this wasn't just a design preference - it's a practical necessity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 - Transcript generation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input → ContentParser → Gemini (LLM) → Transcript → Supabase
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 2 - Audio generation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Transcript → TTS (per speaker) → Audio segments → Merge → MP3 → Supabase Storage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Separating them means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can regenerate audio without re-running the LLM (cheaper)&lt;/li&gt;
&lt;li&gt;You can inspect and even edit the transcript before rendering audio&lt;/li&gt;
&lt;li&gt;Failures are isolated, so a TTS hiccup doesn't waste a Gemini call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both phases run as &lt;strong&gt;Celery tasks&lt;/strong&gt; behind a &lt;strong&gt;FastAPI&lt;/strong&gt; backend, with &lt;strong&gt;Redis&lt;/strong&gt; as the broker. Long-running jobs simply don't belong in an HTTP request/response cycle. A typical generation takes 30–90 seconds depending on content length and TTS provider.&lt;/p&gt;




&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Frontend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;Next.js 16 (App Router) + React 19 + TypeScript + Tailwind CSS &lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt;
&lt;span class="na"&gt;Backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;   &lt;span class="s"&gt;FastAPI + Celery + Redis&lt;/span&gt;
&lt;span class="na"&gt;Database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;Supabase (PostgreSQL + Auth + file storage)&lt;/span&gt;
&lt;span class="na"&gt;LLM&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;       &lt;span class="s"&gt;Google Gemini&lt;/span&gt;
&lt;span class="na"&gt;TTS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;       &lt;span class="s"&gt;Factory pattern → ElevenLabs / OpenAI / Gemini TTS / Edge TTS&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The frontend calls &lt;strong&gt;Next.js API routes&lt;/strong&gt;, which proxy to the FastAPI backend. This keeps secrets server-side and gives us a clean separation between the Next.js layer (auth, UX, billing) and the Python layer (AI, heavy lifting).&lt;/p&gt;

&lt;p&gt;For storage, a single Supabase bucket (&lt;code&gt;audios_n_transcripts&lt;/code&gt;) holds both transcripts (JSON) and final audio (MP3). Row-level security keeps everything scoped to the generating user.&lt;/p&gt;




&lt;h2&gt;
  
  
  The hardest part: parsing anything
&lt;/h2&gt;

&lt;p&gt;The promise, "submit any content", is easy to say and painful to implement.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;ContentParser&lt;/code&gt; service has to handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Web pages&lt;/strong&gt;: rendered via Playwright (headless Chromium), because half the modern web is JavaScript-rendered and can't be scraped with a simple HTTP fetch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PDFs&lt;/strong&gt;: text extraction, with layout awareness to avoid garbled column ordering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Images&lt;/strong&gt;: sent directly to Gemini's multimodal endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Raw text&lt;/strong&gt;: trivial, but still needs cleaning and length normalization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Playwright in particular adds real overhead, it's a full browser. We run it in the Celery worker rather than the API process, and cache aggressively to avoid re-fetching the same URL.&lt;/p&gt;




&lt;h2&gt;
  
  
  TTS: the factory pattern
&lt;/h2&gt;

&lt;p&gt;Different TTS providers have very different tradeoffs - latency, voice quality, cost, language support. Rather than hardcoding one, we use a factory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tts/factory.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_tts_provider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;BaseTTSProvider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;providers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GeminiTTSProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OpenAITTSProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;elevenlabs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ElevenLabsTTSProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;edge&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;EdgeTTSProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each provider implements the same interface: &lt;code&gt;synthesize(text, voice, language) -&amp;gt; bytes&lt;/code&gt;. Swapping providers is a config change, not a code change. This matters because TTS pricing and quality move fast, and we've already switched defaults once.&lt;/p&gt;




&lt;h2&gt;
  
  
  Monetization: card-first reverse trial
&lt;/h2&gt;

&lt;p&gt;The billing model went through a few iterations. Here's where we landed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No free tier&lt;/strong&gt; - new signups must enter a credit card to unlock generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;7-day Hobby trial&lt;/strong&gt; managed by Stripe, with &lt;code&gt;trial_period_days: 7&lt;/code&gt; and &lt;code&gt;payment_method_collection: "always"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;After the trial: auto-charge unless cancelled&lt;/li&gt;
&lt;li&gt;Quota is enforced in &lt;strong&gt;audio characters&lt;/strong&gt; (TTS character count), not generation count, which is fairer for users with varying content lengths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The middleware (&lt;code&gt;proxy.ts&lt;/code&gt;) enforces the paywall: any generation attempt without an active subscription returns a 401 with a redirect to the checkout page. No subscription row in the DB = no generation, full stop.&lt;/p&gt;

&lt;p&gt;This "reverse trial" approach (card first, trial after) filters out people who were never going to pay, and converts the ones who get value quickly. Three paying users in six weeks from a technical product with zero marketing spend. Not viral, but validated.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Ship the infra first.&lt;/strong&gt; The async job pipeline (FastAPI + Celery + Redis) was the most painful part to set up, but getting it right early meant every feature after was just another task type.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two-phase pipelines are worth it.&lt;/strong&gt; The ability to inspect and replay individual phases saved hours of debugging and reduced AI API costs significantly during development.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TTS quality is a product differentiator.&lt;/strong&gt; Users notice voice quality immediately. The factory abstraction let us tune this without touching business logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quota in output units, not input actions.&lt;/strong&gt; Charging per generation sounds simple but punishes users who feed short content. Characters generated is a much better proxy for actual resource consumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Card-first converts.&lt;/strong&gt; Adding the reverse trial (vs. a freemium model) was uncomfortable to ship but immediately filtered signal from noise.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Multi-language support (the TTS layer is ready; the LLM prompts need work)&lt;/li&gt;
&lt;li&gt;Transcript editing UI before audio render&lt;/li&gt;
&lt;li&gt;Podcast RSS feeds so you can subscribe to your own generated shows in any app&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If you're building something similar or want to try it out: &lt;a href="https://podcastify.io" rel="noopener noreferrer"&gt;podcastify.io&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Stack questions, architecture feedback, roast my code, happy to discuss in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>podcast</category>
      <category>saas</category>
    </item>
    <item>
      <title>I Built SMSimple-API : A Pay-As-You-Go SMS API for 2FA, Marketing, and Reminders</title>
      <dc:creator>Pierre</dc:creator>
      <pubDate>Sun, 27 Apr 2025 09:12:29 +0000</pubDate>
      <link>https://forem.com/perror44/i-built-smsimple-api-a-pay-as-you-go-sms-api-for-2fa-marketing-and-reminders-303h</link>
      <guid>https://forem.com/perror44/i-built-smsimple-api-a-pay-as-you-go-sms-api-for-2fa-marketing-and-reminders-303h</guid>
      <description>&lt;p&gt;Hey everyone,&lt;/p&gt;

&lt;p&gt;I'm a solo developer and recently launched &lt;a href="https://smsimple-api.vercel.app/" rel="noopener noreferrer"&gt;SMSimple-API&lt;/a&gt;, a lightweight SMS API designed to make sending text messages as straightforward as possible.​&lt;/p&gt;

&lt;p&gt;🚀 What Is SMSimple-API?&lt;br&gt;
SMSimple is an SMS API built with simplicity and reliability at its core. Whether you're implementing two-factor authentication (2FA), sending marketing messages, or setting up appointment reminders, SMSimple aims to streamline the process.​&lt;/p&gt;

&lt;h2&gt;
  
  
  ✅ Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Pay-As-You-Go Pricing: No subscriptions or hidden fees. You only pay for what you use.&lt;/li&gt;
&lt;li&gt;Developer-Friendly: Clean, intuitive API with straightforward documentation.&lt;/li&gt;
&lt;li&gt;Free Trial: Sign up and receive 4 free credits to test the service.&lt;/li&gt;
&lt;li&gt;Dashboard Monitoring: Keep track of your usage and manage your account with an easy-to-use dashboard.&lt;/li&gt;
&lt;li&gt;Responsive Support: As a solopreneur, I'm directly available to address your suggestions and support requests promptly.​&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  💡 Use Cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Two-Factor Authentication (2FA): Enhance security by sending one-time codes via SMS.&lt;/li&gt;
&lt;li&gt;Marketing Campaigns: Reach out to customers with promotional messages.&lt;/li&gt;
&lt;li&gt;Appointment Reminders: Send timely reminders to reduce no-shows.&lt;/li&gt;
&lt;li&gt;Transactional Notifications: Keep users informed about their account activities.​&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🛠️ How It Works
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Sign Up: Create an account at SMSimple-API and get 4 free credits.&lt;/li&gt;
&lt;li&gt;Integrate: Use the provided API key to send SMS messages from your application.&lt;/li&gt;
&lt;li&gt;Monitor: Use the dashboard to track your usage and manage your credits.​&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm continuously working to improve &lt;a href="https://smsimple-api.vercel.app/" rel="noopener noreferrer"&gt;SMSimple&lt;/a&gt; and would love to hear your feedback. If you have any suggestions or need assistance, feel free to reach out.​&lt;/p&gt;

&lt;p&gt;Looking forward to your thoughts!&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>api</category>
      <category>sms</category>
    </item>
  </channel>
</rss>
