How I Built an Offline Voice Assistant for Smart Home on Raspberry Pi — and Why I Ditched the Cloud

Ivan Parasochenko — Wed, 15 Apr 2026 10:21:06 +0000

My name is Ivan, I'm a solo developer and for over a year I've been building SelenaCore — an open-source smart home hub that works completely offline, supports Ukrainian language, and sends zero data to the cloud.

Why I started

It started with a simple question: why doesn't my smart home understand me in Ukrainian?

Google Home and Amazon Alexa are fine products, but they require constant internet, send voice requests to company servers, and don't handle Ukrainian well. Home Assistant solves some problems, but voice control there is a separate complex topic — and mostly relies on cloud services too.

I started thinking: what if I do everything locally? STT on device, LLM on device, TTS on device. No cloud at all. So that even with no internet connection, everything still works.

That's how SelenaCore was born.

Architecture

The system has several layers:

Offline voice pipeline — wake-word detection → STT → processing → TTS response
Pluggable LLM layer — supports Ollama (local models) and cloud providers as optional fallback
Module system — 21 built-in modules for devices, timers, weather, etc.
React SPA frontend — web interface for management and monitoring
Docker-based deployment

Hardware target: Raspberry Pi 4/5 or NVIDIA Jetson Orin Nano.

STT: why Vosk over Whisper

Whisper gives better accuracy, especially for unusual phrases. But on Raspberry Pi 4 it's too slow for a realtime interface. Vosk with vosk-model-uk gives acceptable accuracy and responds in 300–500ms — comfortable for a voice assistant.

One important caveat: STT is imperfect. Vosk might recognize "turn on the strip in the office" as "turn on the tape in the office". This is a real problem that shaped the entire Intent Recognition architecture.

Intent Recognition: from simple to three-tier pipeline

This turned out to be the hardest part.

Version 1 — regex matching. If the phrase contains "turn on" and "lamp" — it's device.on. Worked for simple cases, broke completely on any variation.

Version 2 — LLM for everything. Great accuracy, but 2–4 second latency even with Ollama + llama3.1:8b on Pi. Unacceptable for voice.

Current version — three tiers:

Tier 0: Fuzzy phrase cache — if the command was already seen and resolved — instant response (~5ms)
Tier 1: Embedding classifier — all-MiniLM-L6-v2 via ONNX, CPU-only, 22MB. Classifies intent by vector similarity. p50 ~155ms, p95 ~374ms
Tier 2: LLM fallback — only for complex or ambiguous requests where embedding isn't confident (~600ms)

Results on a 70-command Ukrainian voice benchmark: 92.9% intent accuracy, 98.6% params accuracy.

Multilingual architecture

All internal processing happens in English only. Translation occurs at two boundaries:

Vosk output (any language → English) via Helsinki-NLP opus-mt-mul-en with CTranslate2 int8 quantization (~300MB)
Before Piper TTS (English → user's language)

Why? Because LLM prompts, intent catalog, and matching logic are all stable and predictable in English. Adding a new language = adding an STT model + TTS voice. Intent Recognition stays unchanged.

Real problems I hit

Problem 1: "turn the tape on"

Helsinki-NLP translates "увімкни стрічку в кабінеті" (turn on the LED strip in the office) as "Turn the tape on in your office". The word "tape" instead of "LED strip" confuses the embedding classifier between device.on and device.off with a margin of just 0.003.

Problem 2: RAM on Raspberry Pi 4

Component	RAM
Vosk STT model	~200MB
ONNX embedding	~90MB
Helsinki translator	~300MB
Piper TTS	~80MB
FastAPI server	~150MB
React SPA	~50MB
Total	~870MB

Ollama with llama3.1:8b adds another ~5GB. So on Pi 4, Ollama works only with swap and 3+ second inference. The LLM fallback can be cloud-based (Claude API) — an option for those who want better quality without powerful hardware.

Problem 3: display stack

Chromium kiosk mode consumes ~300MB RAM. Currently migrating to WPE WebKit via cog — ~50MB footprint. Jetson falls back to Xorg + Chromium due to Tegra DRM incompatibility with wlroots.

What it can do now

Voice control: on/off, brightness, color, temperature
Tuya local protocol (no cloud)
Home Assistant integration via WebSocket
Timers, alarms, reminders
Temperature queries from sensors
Presence detection
Radio streaming
Fully offline mode

Conclusion

SelenaCore proves that a fully offline voice assistant for smart home with Ukrainian language support is real on affordable hardware. It's not a commercial product and not a Home Assistant replacement — it's a tool for people who care about privacy and want to understand what's happening inside.

Open source, MIT license.

GitHub: github.com/dotradepro/SelenaCore
Docs: selenehome.tech

Forem: Ivan Parasochenko