<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Vansh Jangwal</title>
    <description>The latest articles on Forem by Vansh Jangwal (@vansh_jangwal).</description>
    <link>https://forem.com/vansh_jangwal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3873839%2F56a5d162-9568-4156-a2d3-5cd9eb7d7c0e.png</url>
      <title>Forem: Vansh Jangwal</title>
      <link>https://forem.com/vansh_jangwal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/vansh_jangwal"/>
    <language>en</language>
    <item>
      <title>AI VOICE AGENT USING GROQ API</title>
      <dc:creator>Vansh Jangwal</dc:creator>
      <pubDate>Sat, 11 Apr 2026 16:59:15 +0000</pubDate>
      <link>https://forem.com/vansh_jangwal/ai-voice-agent-using-groq-api-5apf</link>
      <guid>https://forem.com/vansh_jangwal/ai-voice-agent-using-groq-api-5apf</guid>
      <description>&lt;p&gt;🎙️ VoiceAgent AI — Local AI Agent with Voice Control &lt;/p&gt;

&lt;p&gt;Fully-functioning voice-controlled local AI Agent for Mem0 AI/ML &amp;amp; Generative AI Developer Intern Assignment. The system accepts audio input, transcribes it, classifies intent with an LLM, and runs local tools, all presented in a sleek, dark-themed Streamlit UI. &lt;/p&gt;

&lt;p&gt;🏗️ Architecture &lt;/p&gt;

&lt;p&gt;The architecture includes a few components such as audio input, speech-to-text, intent classification, and a tool dispatcher.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────┐
│                     VoiceAgent AI                        │
│                                                          │
│  ┌──────────┐   ┌──────────┐   ┌───────────┐            │
│  │  Audio   │──▶│  STT     │──▶│  Intent   │            │
│  │  Input   │   │ (Whisper │   │ Classify  │            │
│  │  .wav    │   │  via     │   │ (LLaMA    │            │
│  │  .mp3    │   │  Groq)   │   │  3.3 70B  │            │
│  └──────────┘   └──────────┘   │  via Groq)│            │
│                                └─────┬─────┘            │
│                                      │                   │
│              ┌───────────────────────┼────────────────┐  │
│              │      Tool Dispatcher  │                │  │
│              │                       ▼                │  │
│              │  create_file │ write_code │ summarize  │  │
│              │              │  general_chat           │  │
│              └─────────────────────────────────────── ┘  │
│                                      │                   │
│                                      ▼                   │
│                            output/ folder                │
│                          (sandboxed, safe)               │
└─────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The architecture diagram above shows this. &lt;/p&gt;

&lt;p&gt;Module Breakdown &lt;/p&gt;

&lt;p&gt;&lt;code&gt;app.py&lt;/code&gt; | Streamlit UI — Pipeline Display, Session History, Human-in-the-loop. &lt;code&gt;intent_classifier.py&lt;/code&gt; | LLaMA 3.3 70B prompt + JSON parsing + graceful fallback. &lt;code&gt;tools.py&lt;/code&gt; | Tool handlers: create_file, write_code, summarize, general_chat. &lt;code&gt;requirements.txt&lt;/code&gt; | Minimal dependencies (Streamlit + Groq SDK). &lt;/p&gt;

&lt;p&gt;🔑 Hardware Note &amp;amp; Workaround &lt;/p&gt;

&lt;p&gt;Why Groq API instead of local models? &lt;/p&gt;

&lt;p&gt;My local machine does not have any dedicated GPU. Running Whisper Large v3 or LLaMA 3.3 70B locally would require at minimum 8GB VRAM (Whisper) and 40GB+ RAM/VRAM (LLaMA 70B quantized). Inference using the models locally will result in accuracy degradation that’s prohibitively high. &lt;/p&gt;

&lt;p&gt;Solution: Groq’s free API tier offers: &lt;/p&gt;

&lt;p&gt;Whisper Large v3 for STT — state-of-the-art accuracy, ~2-3 seconds per audio file LLaMA 3.3 70B Versatile for intent + code generation — extremely fast (~200 tokens/sec on Groq hardware) &lt;/p&gt;

&lt;p&gt;This is fully in compliance with the assignment’s hardware workaround policy. The whole pipeline is running at API speed (3 – 6 seconds, end-to-end). &lt;/p&gt;

&lt;p&gt;✨ Bonus Features Implemented &lt;/p&gt;

&lt;p&gt;Compound Commands — “Summarize and save to file” deals with more than one action Human-in-the-Loop — Checkbox confirmations for any writing-to-file commands Graceful Degradation – when JSON parsing fails, no intent corresponds to the request, or the audio message makes no sense, &lt;code&gt;general_chat&lt;/code&gt; with a useful message is triggered. Session Memory: The entire history of the actions is displayed in the UI for the session. Safe Sandbox — All file operation limited to &lt;code&gt;output/&lt;/code&gt; folder with path traversal safeguard &lt;/p&gt;

&lt;p&gt;📁 Project Structure &lt;/p&gt;

&lt;p&gt;voice-agent-ai/ ├── app.py                  # The main Streamlit UI ├── intent_classifier.py    # LLM-based intent classification ├── tools.py                # Tool execution handlers; ├── requirements.txt        # Dependencies ├── output/                 # All generated files (gitignored) README.md &lt;/p&gt;

&lt;p&gt;🎬 Demo Video &lt;/p&gt;

&lt;p&gt;YouTube Unlisted Link — Demonstrates: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Voice input → “Create a python file with a retry decorator” → &lt;code&gt;write_code&lt;/code&gt; intent → file saved 4. Provide a voice input: “What is the difference between RAM and ROM?" → intent of general_chat → get a response; &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;📝 Technical Article &lt;/p&gt;

&lt;p&gt;Medium / Dev.to Link — Architecture, model selection, Groq’s speed advantage, challenges. &lt;/p&gt;

&lt;p&gt;🛡️ Safety &lt;/p&gt;

&lt;p&gt;All file writes are limited to &lt;code&gt;output/&lt;/code&gt; directory, using an &lt;code&gt;os.path.basename()&lt;/code&gt; stripping Stripping Path traversal (“../”) Human-in-the-loop confirmation before any destructive file operation &lt;/p&gt;

&lt;p&gt;📦 Dependencies &lt;/p&gt;

&lt;p&gt;streamlit&amp;gt;=1.35.0   # UI framework groq&amp;gt;=0.9.0         # Groq SDK (STT + LLM) &lt;/p&gt;

&lt;p&gt;No Bulky ML Libraries Needed. No heavy ML libraries required; can run on any machine with Python 3.9+. &lt;/p&gt;

&lt;p&gt;YOU CAN CHECK MY WORK &lt;/p&gt;

&lt;p&gt;YT -- &lt;a href="https://youtu.be/GyBar8-K7Wk" rel="noopener noreferrer"&gt;https://youtu.be/GyBar8-K7Wk&lt;/a&gt; &lt;/p&gt;

</description>
      <category>ai</category>
      <category>groq</category>
      <category>api</category>
      <category>aiapi</category>
    </item>
  </channel>
</rss>
