<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sameer Shah</title>
    <description>The latest articles on Forem by Sameer Shah (@sameershahh).</description>
    <link>https://forem.com/sameershahh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3406145%2Fd4575899-caee-4fea-a905-e67d6ed671b6.png</url>
      <title>Forem: Sameer Shah</title>
      <link>https://forem.com/sameershahh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sameershahh"/>
    <language>en</language>
    <item>
      <title>Building an Enterprise-Grade AI Voice Agent with Twilio, Deepgram, and Groq Llama-3.3 (Real-Time Telephony Automation)</title>
      <dc:creator>Sameer Shah</dc:creator>
      <pubDate>Mon, 13 Apr 2026 17:38:30 +0000</pubDate>
      <link>https://forem.com/sameershahh/building-an-enterprise-grade-ai-voice-agent-with-twilio-deepgram-and-groq-llama-33-real-time-3ckd</link>
      <guid>https://forem.com/sameershahh/building-an-enterprise-grade-ai-voice-agent-with-twilio-deepgram-and-groq-llama-33-real-time-3ckd</guid>
      <description>&lt;p&gt;Building real-time AI voice agents over actual phone calls is one of the hardest engineering problems you can take on. The latency requirements are brutal (humans notice any delay over ~500ms), the audio pipeline is full of edge cases, and coordinating three different external services — telephony, speech recognition, and LLM — in real time requires careful architectural thinking.&lt;/p&gt;

&lt;p&gt;I built a production-ready, low-latency AI telephony agent from scratch. Here's the full technical breakdown — architecture, implementation details, and the lessons learned along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This System Does
&lt;/h2&gt;

&lt;p&gt;When someone calls your Twilio phone number, this system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Captures the incoming audio stream in real time via Twilio Media Streams (WebSockets)&lt;/li&gt;
&lt;li&gt;Streams the audio to Deepgram Nova-2 for sub-second speech-to-text transcription&lt;/li&gt;
&lt;li&gt;Sends the transcript to Groq's Llama-3.3-70b for contextually aware response generation&lt;/li&gt;
&lt;li&gt;Converts the LLM response to natural-sounding speech using Deepgram Aura TTS&lt;/li&gt;
&lt;li&gt;Streams the audio back to the caller in 20ms frames&lt;/li&gt;
&lt;li&gt;Monitors LLM output for emergency trigger phrases — and redirects the call instantly if detected&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ze9wgkypzsggs2wiz8v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ze9wgkypzsggs2wiz8v.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Stack Breakdown
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Telephony&lt;/td&gt;
&lt;td&gt;Twilio Media Streams&lt;/td&gt;
&lt;td&gt;Battle-tested, global infrastructure with WSS support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;STT&lt;/td&gt;
&lt;td&gt;Deepgram Nova-2&lt;/td&gt;
&lt;td&gt;Best-in-class accuracy + sub-second latency on 8kHz audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;Groq Llama-3.3-70b&lt;/td&gt;
&lt;td&gt;Fastest inference available — critical for real-time voice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTS&lt;/td&gt;
&lt;td&gt;Deepgram Aura&lt;/td&gt;
&lt;td&gt;Low-latency, natural-sounding speech synthesis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Server&lt;/td&gt;
&lt;td&gt;FastAPI + WebSockets&lt;/td&gt;
&lt;td&gt;Async-first, handles concurrent connections cleanly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why Groq? The Latency Problem
&lt;/h2&gt;

&lt;p&gt;This is the most important architectural decision in the whole system. In a voice conversation, you have maybe 300–400ms of budget for the entire round-trip from when speech ends to when the response starts playing. Breaking that down:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5mvrnhyk9foqv0wtcgu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5mvrnhyk9foqv0wtcgu.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's already 310ms with zero slack. Standard LLM APIs would blow this budget entirely. Groq's purpose-built LPU (Language Processing Unit) hardware is what makes real-time voice agents feasible — it's genuinely 10–20x faster than GPU-based inference for token generation speed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; For voice AI, LLM inference speed matters more than model size. A faster, smaller model (Llama-3.3-70b on Groq) will always outperform a slower, larger model for real-time telephony.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Audio Pipeline: Technical Specifications
&lt;/h2&gt;

&lt;p&gt;Twilio's Media Streams deliver audio in a very specific format that the entire pipeline is built around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Encoding&lt;/strong&gt;: 8-bit PCMU (G.711 mu-law) — the standard for telephony&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sample rate&lt;/strong&gt;: 8000 Hz — lower than modern audio, but universal across phone networks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Channel&lt;/strong&gt;: Mono&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frame size&lt;/strong&gt;: 160 bytes = 20ms of audio per WebSocket message&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Deepgram Nova-2 handles 8kHz mu-law natively — resampling on the fly would add latency. The TTS output from Deepgram Aura is similarly fragmented into 20ms frames for smooth playback through the telephony channel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Emergency Triage Logic
&lt;/h2&gt;

&lt;p&gt;One of the most critical features for production deployment is the emergency fallback system. The LLM output monitor runs concurrently with response generation and watches for a configurable set of trigger phrases.&lt;/p&gt;

&lt;p&gt;When a trigger is detected:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Current audio playback is interrupted&lt;/li&gt;
&lt;li&gt;The system calls the Twilio REST API immediately&lt;/li&gt;
&lt;li&gt;The call is redirected to the configured &lt;code&gt;EMERGENCY_FALLBACK_NUMBER&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The event is logged for auditing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is essential for any real-world deployment — medical triage, mental health lines, technical support escalation — where certain situations require immediate human intervention rather than continued AI interaction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI-voice-agent/
├── app/
│   └── core/
│       └── config.py    # SYSTEM_PROMPT lives here — customize behavior
├── .env.example          # All required environment variables documented
├── requirements.txt
└── run.py                # Single entry point — manages full lifecycle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The unified &lt;code&gt;run.py&lt;/code&gt; entry point is a deliberate design decision: it manages ngrok tunnel setup, Twilio webhook synchronization, and FastAPI startup in the correct order — deployment is a single command.&lt;/p&gt;

&lt;h2&gt;
  
  
  Customizing the Agent's Behavior
&lt;/h2&gt;

&lt;p&gt;The agent's entire personality and domain expertise is controlled by a single system prompt in &lt;code&gt;app/core/config.py&lt;/code&gt;. This makes it trivially easy to redeploy the same infrastructure for completely different use cases:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Medical triage agent
&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a medical intake assistant...
Emergency triggers: [&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;chest pain&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;can&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t breathe&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unconscious&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="c1"&gt;# Technical support agent
&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a tier-1 technical support agent...
Escalation triggers: [&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;billing issue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data loss&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;security breach&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="c1"&gt;# Appointment scheduling agent
&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a scheduling assistant...&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deployment
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Sameershahh/AI-voice-agent
&lt;span class="nb"&gt;cd &lt;/span&gt;AI-voice-agent
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate  &lt;span class="c"&gt;# Windows: .venv\Scripts\activate&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Configure credentials&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Fill in GROQ_API_KEY, DEEPGRAM_API_KEY, TWILIO_ACCOUNT_SID,&lt;/span&gt;
&lt;span class="c"&gt;# TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER, EMERGENCY_FALLBACK_NUMBER, PUBLIC_URL&lt;/span&gt;

&lt;span class="c"&gt;# Start everything&lt;/span&gt;
python run.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Session Logging
&lt;/h2&gt;

&lt;p&gt;All session interactions and transcripts are automatically persisted to the &lt;code&gt;logs/&lt;/code&gt; directory. This is non-optional for production — you need a complete audit trail for compliance, debugging, and performance analysis. Logs include full call transcripts, LLM responses, latency measurements, and any emergency triage events.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Use Cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Medical triage&lt;/strong&gt; — AI handles initial intake, escalates critical cases to on-call staff&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical support&lt;/strong&gt; — tier-1 resolution with intelligent escalation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Appointment scheduling&lt;/strong&gt; — natural conversation flow for booking and rescheduling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lead qualification&lt;/strong&gt; — automated inbound sales calls with CRM integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Emergency hotlines&lt;/strong&gt; — AI-assisted triage with guaranteed human escalation path&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Sameershahh/AI-voice-agent" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.twilio.com/docs/voice/media-streams" rel="noopener noreferrer"&gt;Twilio Media Streams Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.deepgram.com/docs/nova-2" rel="noopener noreferrer"&gt;Deepgram Nova-2 Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://console.groq.com/docs/llama" rel="noopener noreferrer"&gt;Groq Llama-3.3 Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building voice AI infrastructure or have questions about latency optimization, WebSocket audio pipelines, or the emergency triage implementation — drop a comment. There aren't many publicly documented implementations of this full stack yet, and I'm happy to go deeper on any part of it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by Sameer Shah — AI &amp;amp; Full-Stack Developer | &lt;a href="https://sameershah-portfolio.vercel.app/" rel="noopener noreferrer"&gt;Portfolio&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>opensource</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Building an AI-Powered Event Feedback System: Gemini 2.5 + FastAPI + Supabase + Automated PDF Reports</title>
      <dc:creator>Sameer Shah</dc:creator>
      <pubDate>Mon, 13 Apr 2026 17:16:16 +0000</pubDate>
      <link>https://forem.com/sameershahh/building-an-ai-powered-event-feedback-system-gemini-25-fastapi-supabase-automated-pdf-eii</link>
      <guid>https://forem.com/sameershahh/building-an-ai-powered-event-feedback-system-gemini-25-fastapi-supabase-automated-pdf-eii</guid>
      <description>&lt;p&gt;After running an event, the feedback collection process is almost always broken. You get a spreadsheet of raw responses, manually tally the ratings, write a summary email to the organizer, and hope you didn't miss anything important. It's tedious, slow, and doesn't scale.&lt;/p&gt;

&lt;p&gt;I built a full-stack AI-powered event feedback system that automates this entire pipeline end-to-end — from the moment an attendee submits feedback to a branded PDF report landing in the organizer's inbox, complete with AI-generated sentiment analysis and improvement suggestions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the System Does
&lt;/h2&gt;

&lt;p&gt;Every feedback submission triggers a fully automated chain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Form data is cleaned and validated by the FastAPI backend&lt;/li&gt;
&lt;li&gt;Gemini 2.5 analyzes the text for sentiment, urgency, highlights, and suggestions&lt;/li&gt;
&lt;li&gt;A branded HTML report is generated and converted to PDF&lt;/li&gt;
&lt;li&gt;The PDF is emailed to the event organizer via Gmail SMTP&lt;/li&gt;
&lt;li&gt;All data (including AI output) is persisted to a Supabase PostgreSQL database&lt;/li&gt;
&lt;li&gt;The analytics dashboard reflects the new submission in real time&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Form Submission
      │
      ▼
Data Cleaning &amp;amp; Normalization
      │
      ▼
Gemini 2.5 AI Analysis
      │
      ▼
HTML Report Generation (Branded)
      │
      ▼
PDF Conversion
      │
      ├──▶ SMTP Email to Organizer
      │
      └──▶ Supabase Database Storage
                │
                ▼
      Dashboard &amp;amp; Analytics
                │
                ▼
      Date-Range Bulk Report (PDF + Email)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Next.js + Tailwind CSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;FastAPI (Python)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Analysis&lt;/td&gt;
&lt;td&gt;Google Gemini 2.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Email&lt;/td&gt;
&lt;td&gt;Gmail SMTP (TLS, Port 587)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PDF Generation&lt;/td&gt;
&lt;td&gt;HTML-to-PDF conversion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;Supabase (PostgreSQL)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The AI Analysis Layer: Gemini 2.5
&lt;/h2&gt;

&lt;p&gt;The most interesting part of this system is the AI pipeline. Rather than using a generic sentiment library, I integrated Google Gemini 2.5 directly to generate structured, actionable intelligence from each feedback submission.&lt;/p&gt;

&lt;p&gt;The prompt is designed to return a consistent JSON structure every time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sentiment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Positive | Neutral | Negative"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2-3 sentence summary of the feedback"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"highlights"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"key point 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"key point 2"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"improvementSuggestions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"suggestion 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"suggestion 2"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"urgency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Low | Medium | High"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;urgency&lt;/code&gt; field is particularly useful — it lets organizers immediately identify submissions that need follow-up action, without reading every comment manually.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; When prompting Gemini for structured JSON output, always explicitly specify the expected schema in your prompt and validate the response server-side. LLMs can occasionally deviate from the schema under edge cases.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  📄 Automated PDF Report Generation
&lt;/h2&gt;

&lt;p&gt;After AI analysis, the backend generates a branded HTML report and converts it to PDF. The report includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Event metadata (name, date, organizer)&lt;/li&gt;
&lt;li&gt;Attendee information&lt;/li&gt;
&lt;li&gt;Star rating visualization&lt;/li&gt;
&lt;li&gt;AI-generated summary and highlights&lt;/li&gt;
&lt;li&gt;Improvement suggestions with urgency indicator&lt;/li&gt;
&lt;li&gt;Submission ID and timestamp for traceability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using HTML-to-PDF conversion rather than a PDF library gives you full control over visual design — CSS, fonts, colors, layout — without being constrained by a library's API.&lt;/p&gt;

&lt;h2&gt;
  
  
  SMTP Email Delivery
&lt;/h2&gt;

&lt;p&gt;The PDF report is automatically emailed to the event organizer via Gmail SMTP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Email config
&lt;/span&gt;&lt;span class="n"&gt;SMTP_HOST&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;smtp.gmail.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;SMTP_PORT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;587&lt;/span&gt;
&lt;span class="n"&gt;SMTP_SECURITY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TLS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# The SMTP password is stored in .env — never hardcoded
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Gmail requires an App Password (not your account password) when authenticating via SMTP with 2FA enabled. Store it exclusively in environment secrets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Database Schema: Supabase
&lt;/h2&gt;

&lt;p&gt;All submissions and AI outputs are persisted to a Supabase PostgreSQL table (&lt;code&gt;feedback_reports&lt;/code&gt;). The schema captures everything — raw input, AI analysis fields, PDF URL, and email delivery status — in a single row per submission, making dashboard queries simple and fast.&lt;/p&gt;

&lt;p&gt;Key fields: &lt;code&gt;submissionId&lt;/code&gt;, &lt;code&gt;eventName&lt;/code&gt;, &lt;code&gt;eventDate&lt;/code&gt;, &lt;code&gt;rating&lt;/code&gt;, &lt;code&gt;sentiment&lt;/code&gt;, &lt;code&gt;summary&lt;/code&gt;, &lt;code&gt;urgency&lt;/code&gt;, &lt;code&gt;highlights&lt;/code&gt;, &lt;code&gt;improvementSuggestions&lt;/code&gt;, &lt;code&gt;pdfUrl&lt;/code&gt;, &lt;code&gt;emailSent&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Analytics Dashboard
&lt;/h2&gt;

&lt;p&gt;The Next.js dashboard connects to the API and provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time submission counts and average ratings&lt;/li&gt;
&lt;li&gt;Sentiment breakdown (Positive / Neutral / Negative)&lt;/li&gt;
&lt;li&gt;High-urgency submission highlighting&lt;/li&gt;
&lt;li&gt;Filterable table with date range, event name, and sentiment filters&lt;/li&gt;
&lt;li&gt;Per-submission PDF access links&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;strong&gt;bulk report feature&lt;/strong&gt; is especially powerful: select a date range, and the system retrieves all submissions in that window, passes the full dataset to Gemini for a consolidated analysis, generates a combined PDF, and emails it to the admin. A multi-hour manual reporting task reduced to a single button click.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Considerations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;SMTP credentials stored exclusively in environment variables — never in the codebase&lt;/li&gt;
&lt;li&gt;Supabase connection credentials managed via env secrets&lt;/li&gt;
&lt;li&gt;Input validation at the FastAPI layer using Pydantic before data reaches AI or storage&lt;/li&gt;
&lt;li&gt;Gemini API key scoped to the backend only — never exposed to the client&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Sameershahh/feedback-form-system
&lt;span class="nb"&gt;cd &lt;/span&gt;feedback-form-system

&lt;span class="c"&gt;# Backend&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;api
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="c"&gt;# Add .env with Supabase, Gmail SMTP, and Gemini credentials&lt;/span&gt;
uvicorn main:app &lt;span class="nt"&gt;--reload&lt;/span&gt;

&lt;span class="c"&gt;# Frontend&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ../frontend
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Conference and workshop organizers collecting post-event feedback&lt;/li&gt;
&lt;li&gt;Corporate training teams tracking session quality&lt;/li&gt;
&lt;li&gt;SaaS products embedding feedback collection into onboarding flows&lt;/li&gt;
&lt;li&gt;Event agencies needing automated client-ready reports&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Sameershahh/feedback-form-system" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemini-api/docs" rel="noopener noreferrer"&gt;Google Gemini API Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://supabase.com/docs" rel="noopener noreferrer"&gt;Supabase Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Curious about the AI prompt engineering behind the Gemini integration, or how the bulk report aggregation works? Drop a comment — happy to go deeper on any part of the stack.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by Sameer Shah — AI &amp;amp; Full-Stack Developer | &lt;a href="https://sameershah-portfolio.vercel.app/" rel="noopener noreferrer"&gt;Portfolio&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>fastapi</category>
      <category>nextjs</category>
      <category>automation</category>
    </item>
    <item>
      <title>I Built a Full-Stack Page Generation Engine with FastAPI + Next.js (And Here's the Architecture)</title>
      <dc:creator>Sameer Shah</dc:creator>
      <pubDate>Mon, 13 Apr 2026 09:15:35 +0000</pubDate>
      <link>https://forem.com/sameershahh/i-built-a-full-stack-page-generation-engine-with-fastapi-nextjs-and-heres-the-architecture-37p6</link>
      <guid>https://forem.com/sameershahh/i-built-a-full-stack-page-generation-engine-with-fastapi-nextjs-and-heres-the-architecture-37p6</guid>
      <description>&lt;p&gt;There's a common problem in modern web development: you have structured data — JSON from a CMS, a database, or a configuration file — and you need to turn it into fully-rendered, styled web pages dynamically. Most solutions either lock you into a specific CMS or require a lot of glue code across disconnected systems.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;PageForge API&lt;/strong&gt; — a full-stack page generation engine that takes structured JSON input and forges dynamic, styled web pages through a clean API-first architecture. Here's how it works and why I made the architectural choices I did.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Structured Data → Web Pages
&lt;/h2&gt;

&lt;p&gt;Imagine you're building a platform where clients need unique landing pages, product pages, or documentation pages generated from configuration data. Hard-coding templates doesn't scale. A headless CMS is overkill. What you need is a programmable, API-driven page forge.&lt;/p&gt;

&lt;p&gt;PageForge API solves this by acting as a bridge: you send it JSON, it returns rendered pages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;The project is split into two distinct layers that communicate cleanly:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;FastAPI (Python)&lt;/td&gt;
&lt;td&gt;Data ingestion, processing, template logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Next.js (TypeScript)&lt;/td&gt;
&lt;td&gt;Page rendering, routing, client-side display&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data&lt;/td&gt;
&lt;td&gt;JSON schemas&lt;/td&gt;
&lt;td&gt;Input contracts for page structures&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The backend handles all the heavy lifting — validation, data normalization, template selection, and response construction. The frontend consumes the processed output and renders it via Next.js's hybrid rendering capabilities (SSR + CSR depending on the use case).&lt;/p&gt;

&lt;h2&gt;
  
  
  Repository Structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;page-forge-api/
├── fastapi/          # Python backend — core API logic
├── nextjs/           # TypeScript frontend — rendering layer
├── forge-data.json   # Sample forged page data
├── input-data.json   # Example input schema
├── new-sample.json   # Additional test fixtures
└── sample-data.json  # Reference data structures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The multiple JSON fixture files aren't just test data — they demonstrate the range of input schemas the engine is designed to handle. Real-world page generators need to be resilient to varied input shapes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The FastAPI Backend
&lt;/h2&gt;

&lt;p&gt;FastAPI was the natural choice here for several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automatic OpenAPI docs&lt;/strong&gt; — as soon as you define your Pydantic models, you get interactive API documentation for free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type safety from day one&lt;/strong&gt; — Pydantic models enforce your input schema contract, so malformed data is rejected at the boundary, not deep inside your logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async-first&lt;/strong&gt; — if the engine needs to fetch external resources or call downstream services, async handlers ensure the server stays non-blocking.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The backend receives JSON input representing a page definition, validates it against a schema, applies transformation logic, and returns a structured response that the frontend can render.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Next.js Frontend
&lt;/h2&gt;

&lt;p&gt;Next.js sits at the frontend layer and handles rendering. The key design decision here was keeping the frontend &lt;em&gt;dumb&lt;/em&gt; — it doesn't make business logic decisions. It receives processed data from the API and maps it to components.&lt;/p&gt;

&lt;p&gt;This decoupling is powerful because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can swap the rendering layer without touching backend logic&lt;/li&gt;
&lt;li&gt;The API can serve multiple frontends or even mobile clients&lt;/li&gt;
&lt;li&gt;Testing is clean — you can unit test the API independently of rendering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;TypeScript (93.4% of the codebase) ensures that data contracts between the API response and frontend components are enforced at compile time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data Layer: JSON Contracts
&lt;/h2&gt;

&lt;p&gt;The most interesting design problem in a page generation engine is the input schema. Too rigid and it's unusable. Too flexible and validation becomes a nightmare.&lt;/p&gt;

&lt;p&gt;PageForge solves this with layered JSON schemas — a base contract all inputs must satisfy, with optional extension fields for more complex page types. Here's a simplified conceptual structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pageType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"landing | product | docs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"meta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"slug"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sections"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hero | features | cta | content"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The backend validates this structure, resolves any references, applies defaults, and returns a fully-resolved page object ready to render.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why API-First Page Generation Matters
&lt;/h2&gt;

&lt;p&gt;The API-first approach to page generation is increasingly relevant as teams move toward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Programmatic content generation at scale&lt;/li&gt;
&lt;li&gt;AI-driven page creation pipelines&lt;/li&gt;
&lt;li&gt;Multi-channel publishing from a single data source&lt;/li&gt;
&lt;li&gt;Headless architectures where the rendering layer is replaceable&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;PageForge is a great foundation if you're building a white-label page generator, an AI-assisted website builder, or any system where page structure is driven by data rather than manual design.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Running It Locally
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repo&lt;/span&gt;
git clone https://github.com/candidateconnectt/page-forge-api

&lt;span class="c"&gt;# Backend setup&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;fastapi
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
uvicorn main:app &lt;span class="nt"&gt;--reload&lt;/span&gt;

&lt;span class="c"&gt;# Frontend setup (new terminal)&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ../nextjs
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Some natural extensions I'm considering for this engine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI-powered section generation — pass a prompt, get a page section back&lt;/li&gt;
&lt;li&gt;Template versioning and A/B testing support&lt;/li&gt;
&lt;li&gt;Webhook support for triggering page regeneration on data changes&lt;/li&gt;
&lt;li&gt;Export to static HTML for edge deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/candidateconnectt/page-forge-api" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://fastapi.tiangolo.com/" rel="noopener noreferrer"&gt;FastAPI Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nextjs.org/docs" rel="noopener noreferrer"&gt;Next.js Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building anything similar — headless page builders, dynamic content engines, or API-first web tools — I'd love to hear about your approach in the comments. And if this was useful, a ⭐ on GitHub goes a long way!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by Sameer Shah — AI &amp;amp; Full-Stack Developer | &lt;a href="https://sameershah-portfolio.vercel.app/" rel="noopener noreferrer"&gt;Portfolio&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>fastapi</category>
      <category>python</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How I Built a Facial-Expression Recognition Model with PyTorch (FER-2013, 72% Val Acc)</title>
      <dc:creator>Sameer Shah</dc:creator>
      <pubDate>Tue, 12 Aug 2025 21:30:24 +0000</pubDate>
      <link>https://forem.com/sameershahh/how-i-built-a-facial-expression-recognition-model-with-pytorch-fer-2013-72-val-acc-2oc3</link>
      <guid>https://forem.com/sameershahh/how-i-built-a-facial-expression-recognition-model-with-pytorch-fer-2013-72-val-acc-2oc3</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkxm9qcjimu4uzao48y7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkxm9qcjimu4uzao48y7.png" alt=" " width="586" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I trained a 3-block CNN in PyTorch on the FER-2013 dataset to classify 7 emotions. This post explains the dataset challenges, preprocessing and augmentation, exact model architecture, training recipe, evaluation (confusion matrix + per-class F1), and next steps for deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Emotion recognition enables richer human–computer interactions. I chose FER-2013 because it’s realistic: low-resolution (48×48), grayscale, and class-imbalanced. The goal: produce a reproducible, deployment-ready CNN pipeline that balances accuracy and efficiency for real-time inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem statement
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Input: 48×48 grayscale faces.&lt;/li&gt;
&lt;li&gt;Task: 7-class classification — Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral.&lt;/li&gt;
&lt;li&gt;Challenges: small images → limited features, class imbalance, noisy labels, and intra-class variation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Dataset &amp;amp; preprocessing
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Source: FER-2013 (Kaggle). Split into train/val/test as in the original CSV (or your split).&lt;/li&gt;
&lt;li&gt;Preprocessing pipeline (PyTorch transforms):
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from torchvision import transforms

train_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomHorizontalFlip(),     # sensible for faces
    transforms.RandomRotation(10),         # small rotations
    transforms.RandomResizedCrop(48, scale=(0.9,1.0)),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

val_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((48,48)),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Model architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikuuv837jg3ojwncgddl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikuuv837jg3ojwncgddl.png" alt=" " width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: 48 × 48 × 1 (grayscale)&lt;/li&gt;
&lt;li&gt;Block 1: Conv2d(1 → 64, 3×3, pad=1) → BatchNorm2d(64) → ReLU → MaxPool2d(2×2) → OUTPUT 24×24×64&lt;/li&gt;
&lt;li&gt;Block 2: Conv2d(64 → 128, 3×3, pad=1) → BatchNorm2d(128) → ReLU → MaxPool2d(2×2) → OUTPUT 12×12×128&lt;/li&gt;
&lt;li&gt;Block 3: Conv2d(128 → 256, 3×3, pad=1) → BatchNorm2d(256) → ReLU → MaxPool2d(2×2) → OUTPUT 6×6×256&lt;/li&gt;
&lt;li&gt;Dropout2d(p=0.25) → Flatten (9216) → FC(9216 → 512) → ReLU → Dropout(p=0.5) → FC(512 → 7) → Softmax (inference)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Training recipe
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Loss: CrossEntropyLoss()&lt;/li&gt;
&lt;li&gt;Optimizer: AdamW(lr=1e-3, weight_decay=1e-4)&lt;/li&gt;
&lt;li&gt;Scheduler: ReduceLROnPlateau or CosineAnnealingLR (I used ReduceLROnPlateau on val loss)&lt;/li&gt;
&lt;li&gt;Batch size: 64 (adjust by GPU memory)&lt;/li&gt;
&lt;li&gt;Epochs: 30–60 with early stopping (patience 7 on val loss)&lt;/li&gt;
&lt;li&gt;Checkpoint: save best_model.pt by val F1 (or loss)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Minimal training loop snippet
&lt;/h2&gt;

&lt;p&gt;`criterion = nn.CrossEntropyLoss()&lt;br&gt;
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)&lt;br&gt;
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=3)&lt;/p&gt;

&lt;p&gt;for epoch in range(1, epochs+1):&lt;br&gt;
    train_one_epoch(model, train_loader, optimizer, criterion)&lt;br&gt;
    val_loss, val_metrics = validate(model, val_loader, criterion)&lt;br&gt;
    scheduler.step(val_loss)&lt;br&gt;
    if val_metrics['f1_macro'] &amp;gt; best_f1:&lt;br&gt;
        best_f1 = val_metrics['f1_macro']&lt;br&gt;
        torch.save(model.state_dict(), 'best_model.pt')&lt;br&gt;
`&lt;/p&gt;

&lt;h2&gt;
  
  
  Reproducibility
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import random, numpy as np, torch
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Check code on &lt;a href="https://github.com/Sameershahh/Facial_Expression_Recognizer" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. If you want this adapted for a real-time webcam or a Django web deploy, contact me.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
