<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ivan Parasochenko</title>
    <description>The latest articles on Forem by Ivan Parasochenko (@dotradepro).</description>
    <link>https://forem.com/dotradepro</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3880232%2Fff5aec8d-eb22-4f8d-868b-a77cf2e774e6.jpeg</url>
      <title>Forem: Ivan Parasochenko</title>
      <link>https://forem.com/dotradepro</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/dotradepro"/>
    <language>en</language>
    <item>
      <title>How I Built an Offline Voice Assistant for Smart Home on Raspberry Pi — and Why I Ditched the Cloud</title>
      <dc:creator>Ivan Parasochenko</dc:creator>
      <pubDate>Wed, 15 Apr 2026 10:21:06 +0000</pubDate>
      <link>https://forem.com/dotradepro/how-i-built-an-offline-voice-assistant-for-smart-home-on-raspberry-pi-and-why-i-ditched-the-cloud-3aei</link>
      <guid>https://forem.com/dotradepro/how-i-built-an-offline-voice-assistant-for-smart-home-on-raspberry-pi-and-why-i-ditched-the-cloud-3aei</guid>
      <description>&lt;p&gt;My name is Ivan, I'm a solo developer and for over a year I've been building SelenaCore — an open-source smart home hub that works completely offline, supports Ukrainian language, and sends zero data to the cloud.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I started
&lt;/h2&gt;

&lt;p&gt;It started with a simple question: why doesn't my smart home understand me in Ukrainian?&lt;/p&gt;

&lt;p&gt;Google Home and Amazon Alexa are fine products, but they require constant internet, send voice requests to company servers, and don't handle Ukrainian well. Home Assistant solves some problems, but voice control there is a separate complex topic — and mostly relies on cloud services too.&lt;/p&gt;

&lt;p&gt;I started thinking: what if I do everything locally? STT on device, LLM on device, TTS on device. No cloud at all. So that even with no internet connection, everything still works.&lt;/p&gt;

&lt;p&gt;That's how SelenaCore was born.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The system has several layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Offline voice pipeline&lt;/strong&gt; — wake-word detection → STT → processing → TTS response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pluggable LLM layer&lt;/strong&gt; — supports Ollama (local models) and cloud providers as optional fallback&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Module system&lt;/strong&gt; — 21 built-in modules for devices, timers, weather, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;React SPA frontend&lt;/strong&gt; — web interface for management and monitoring&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Docker-based deployment&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hardware target: Raspberry Pi 4/5 or NVIDIA Jetson Orin Nano.&lt;/p&gt;

&lt;h2&gt;
  
  
  STT: why Vosk over Whisper
&lt;/h2&gt;

&lt;p&gt;Whisper gives better accuracy, especially for unusual phrases. But on Raspberry Pi 4 it's too slow for a realtime interface. Vosk with &lt;code&gt;vosk-model-uk&lt;/code&gt; gives acceptable accuracy and responds in 300–500ms — comfortable for a voice assistant.&lt;/p&gt;

&lt;p&gt;One important caveat: STT is imperfect. Vosk might recognize "turn on the strip in the office" as "turn on the tape in the office". This is a real problem that shaped the entire Intent Recognition architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intent Recognition: from simple to three-tier pipeline
&lt;/h2&gt;

&lt;p&gt;This turned out to be the hardest part.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version 1&lt;/strong&gt; — regex matching. If the phrase contains "turn on" and "lamp" — it's &lt;code&gt;device.on&lt;/code&gt;. Worked for simple cases, broke completely on any variation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version 2&lt;/strong&gt; — LLM for everything. Great accuracy, but 2–4 second latency even with Ollama + llama3.1:8b on Pi. Unacceptable for voice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Current version&lt;/strong&gt; — three tiers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tier 0: Fuzzy phrase cache&lt;/strong&gt; — if the command was already seen and resolved — instant response (~5ms)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier 1: Embedding classifier&lt;/strong&gt; — &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; via ONNX, CPU-only, 22MB. Classifies intent by vector similarity. p50 ~155ms, p95 ~374ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier 2: LLM fallback&lt;/strong&gt; — only for complex or ambiguous requests where embedding isn't confident (~600ms)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Results on a 70-command Ukrainian voice benchmark: &lt;strong&gt;92.9% intent accuracy&lt;/strong&gt;, 98.6% params accuracy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multilingual architecture
&lt;/h2&gt;

&lt;p&gt;All internal processing happens in &lt;strong&gt;English only&lt;/strong&gt;. Translation occurs at two boundaries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vosk output (any language → English) via Helsinki-NLP &lt;code&gt;opus-mt-mul-en&lt;/code&gt; with CTranslate2 int8 quantization (~300MB)&lt;/li&gt;
&lt;li&gt;Before Piper TTS (English → user's language)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why? Because LLM prompts, intent catalog, and matching logic are all stable and predictable in English. Adding a new language = adding an STT model + TTS voice. Intent Recognition stays unchanged.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real problems I hit
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Problem 1: "turn the tape on"
&lt;/h3&gt;

&lt;p&gt;Helsinki-NLP translates "увімкни стрічку в кабінеті" (turn on the LED strip in the office) as "Turn the tape on in your office". The word "tape" instead of "LED strip" confuses the embedding classifier between &lt;code&gt;device.on&lt;/code&gt; and &lt;code&gt;device.off&lt;/code&gt; with a margin of just 0.003.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 2: RAM on Raspberry Pi 4
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vosk STT model&lt;/td&gt;
&lt;td&gt;~200MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ONNX embedding&lt;/td&gt;
&lt;td&gt;~90MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Helsinki translator&lt;/td&gt;
&lt;td&gt;~300MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Piper TTS&lt;/td&gt;
&lt;td&gt;~80MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FastAPI server&lt;/td&gt;
&lt;td&gt;~150MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;React SPA&lt;/td&gt;
&lt;td&gt;~50MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~870MB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Ollama with llama3.1:8b adds another ~5GB. So on Pi 4, Ollama works only with swap and 3+ second inference. The LLM fallback can be cloud-based (Claude API) — an option for those who want better quality without powerful hardware.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 3: display stack
&lt;/h3&gt;

&lt;p&gt;Chromium kiosk mode consumes ~300MB RAM. Currently migrating to WPE WebKit via &lt;code&gt;cog&lt;/code&gt; — ~50MB footprint. Jetson falls back to Xorg + Chromium due to Tegra DRM incompatibility with wlroots.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it can do now
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Voice control: on/off, brightness, color, temperature&lt;/li&gt;
&lt;li&gt;Tuya local protocol (no cloud)&lt;/li&gt;
&lt;li&gt;Home Assistant integration via WebSocket&lt;/li&gt;
&lt;li&gt;Timers, alarms, reminders&lt;/li&gt;
&lt;li&gt;Temperature queries from sensors&lt;/li&gt;
&lt;li&gt;Presence detection&lt;/li&gt;
&lt;li&gt;Radio streaming&lt;/li&gt;
&lt;li&gt;Fully offline mode&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;SelenaCore proves that a fully offline voice assistant for smart home with Ukrainian language support is real on affordable hardware. It's not a commercial product and not a Home Assistant replacement — it's a tool for people who care about privacy and want to understand what's happening inside.&lt;/p&gt;

&lt;p&gt;Open source, MIT license.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/dotradepro/SelenaCore" rel="noopener noreferrer"&gt;github.com/dotradepro/SelenaCore&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://selenehome.tech" rel="noopener noreferrer"&gt;selenehome.tech&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>raspberrypi</category>
      <category>iot</category>
      <category>homeautomation</category>
      <category>unoplatformchallenge</category>
    </item>
  </channel>
</rss>
