How I Built a Voice-Controlled AI Agent with Groq & Streamlit

Nishtha Darji — Wed, 15 Apr 2026 12:10:45 +0000

I built a fully working voice-controlled AI agent that transcribes speech, classifies intent, and executes local tools — all powered by Groq's free AI APIs.

🎯 What It Does

You speak → It listens → It understands → It acts.

Say "Write a Python retry function" → generates code and saves to file
Say "Summarize this text" → returns a clean summary
Say "What is machine learning?" → responds conversationally
Say "Create a file called notes.txt" → creates the file safely

🏗️ Architecture

Audio Input → Groq Whisper (STT) → Groq LLaMA 3.3 70B (Intent) → Tool Execution → Streamlit UI

🧠 Models I Chose & Why

Speech-to-Text: Groq Whisper large-v3

Transcribes audio in under 2 seconds
Free tier available, no GPU needed

LLM: Groq LLaMA 3.3 70B Versatile

Accurately classifies intent from natural speech
Handles compound commands like "write X and save to Y.py"

⚙️ Tech Stack

Streamlit — Web UI
Groq API — STT + LLM
Python — Backend logic

🚧 Challenges I Faced

1. Model Deprecation
During development, llama3-8b-8192 was decommissioned by Groq. I switched to llama-3.3-70b-versatile which is more powerful and still free.

2. Compound Commands
Handling commands like "Write a bubble sort and save it to sort.py" required careful prompt engineering to extract both the intent and filename simultaneously.

3. Safe File Operations
All file writes are sandboxed to an output/ folder with path traversal protection so no system files can be accidentally overwritten.

✨ Bonus Features

✅ Human-in-the-loop confirmation before file operations
✅ Session memory — last 4 turns passed as context
✅ Auto fallback if API fails
✅ Compound command support

🔗 Links

GitHub:(https://github.com/nishtha-3011/voice-ai-agent)

Thanks for reading! Feel free to star the repo if you found it useful ⭐

Forem: Nishtha Darji