Forem: Emily Lin

Local-First Voice AI: What Actually Works (and What Doesn't) — Week 3

Emily Lin — Sun, 24 Aug 2025 21:16:41 +0000

This is part of my journey building the Kai ecosystem—a fully local, offline-first voice assistant that keeps your data yours.
Well, I started building an app for myself first.
I collaborated with Claude to build layered time parsing logic all through natural language and my goal is to see a functional app that does what it is designed for.

Kai Lite: 5-Point Summary

Privacy-first voice assistant - Complete offline functionality, zero cloud data sharing, all data stays on your device
Natural voice commands - Add reminders, create memos, check calendar using speech-to-text with pattern-based parsing
Local-first architecture - Flutter mobile app with SQLite storage, works in airplane mode, no internet required
User data control - Export/delete everything anytime, transparent permissions, visual indicators when mic is active
Future ecosystem foundation - Designed to sync with Kai Laptop/Desktop while maintaining privacy and user control

This week, I'm sharing what actually happened when I tried to build a voice agent that works completely offline. Turns out, it is harder than expected for native AI builders.
App Demo

My AI Collaborator This Week

Claude: My main implementation partner throughout this build. From initial architecture decisions to debugging regex patterns, Claude helped me think through each technical challenge and iterate quickly on solutions.

What I Actually Built (The Messy Reality)

Attempt 1: "Let's Build Alexa-Level Voice Commands"
The goal was ambitious: voice commands that work as smoothly as Alexa, but completely local.
Started with the standard Flutter voice setup:

dependencies:
  speech_to_text: ^6.3.0
  flutter_tts: ^3.8.3
  permission_handler: ^11.0.1

Basic voice service structure:

class VoiceService {
  final SpeechToText _speech = SpeechToText();
  final FlutterTts _tts = FlutterTts();

  Future<void> initialize() async {
    await _speech.initialize();
    // Kai's calm voice settings
    await _tts.setSpeechRate(0.9);  
    await _tts.setPitch(1.0);
  }
}

The reality check:
Spent a day testing and realized that even with onDevice: true, the accuracy wasn't consistent enough for the "Alexa-level" experience I wanted.
Result: Needed a completely different approach.

Attempt 2: Comprehensive Pattern-Based Parser (What Actually Works)

Claude suggested focusing on pattern-based parsing instead of trying to build mini-Alexa.
Smart advice—I used AI to help design the VoiceCommandParser architecture and generate comprehensive regex patterns for different ways people naturally speak.

class VoiceCommandParser {
  static final Map<String, List<RegExp>> patterns = {
    'calendar_add': [
      RegExp(r'remind me to (.*?) at (.*)'),
      RegExp(r'add (.*?) to calendar at (.*)'),
      RegExp(r'schedule (.*?) for (.*)'),
      RegExp(r'set reminder (.*?) at (.*)'),
      RegExp(r'(.*?) at (.*?) today'),
      RegExp(r'(.*?) at (.*?) tomorrow'),
    ],
    'calendar_check': [
      RegExp(r"what'?s on my calendar\??"),
      RegExp(r"what do i have today\??"),
      RegExp(r"show my schedule"),
      RegExp(r"any events today\??"),
    ],
    'memo_add': [
      RegExp(r'note to self[,:]? (.*)'),
      RegExp(r'remember that (.*)'),
      RegExp(r'make a note[,:]? (.*)'),
      RegExp(r'write down (.*)'),
    ],
  };

  static VoiceCommand parse(String input) {
    input = input.toLowerCase().trim();

    // Check each pattern category
    for (final entry in patterns.entries) {
      final intent = entry.key;
      final patternList = entry.value;

      for (final pattern in patternList) {
        final match = pattern.firstMatch(input);
        if (match != null) {
          return _extractCommand(intent, input, match);
        }
      }
    }

    // Fuzzy matching fallback
    return _fuzzyMatch(input);
  }
}

Added smart time parsing that handles natural language:

static String? _parseTime(String timeStr) {
  // Natural language conversions
  final conversions = {
    'morning': '9:00 AM',
    'afternoon': '2:00 PM', 
    'evening': '6:00 PM',
    'night': '9:00 PM',
    'noon': '12:00 PM',
    'midnight': '12:00 AM',
  };

  // Check natural language first
  for (final entry in conversions.entries) {
    if (timeStr.contains(entry.key)) {
      return entry.value;
    }
  }

  // Parse actual times (3pm, 3:30pm, 15:00)
  final timeMatch = RegExp(r'(\d{1,2})(?::(\d{2}))?\s*(am|pm)?', 
                          caseSensitive: false).firstMatch(timeStr);
  if (timeMatch != null) {
    var hour = int.parse(timeMatch.group(1) ?? '0');
    final minute = timeMatch.group(2) ?? '00';
    var ampm = timeMatch.group(3)?.toUpperCase();

    // Smart guessing for ambiguous times
    if (ampm == null) {
      if (hour >= 7 && hour <= 11) {
        ampm = 'AM';
      } else if (hour >= 1 && hour <= 6) {
        ampm = 'PM';
      } else if (hour >= 13 && hour <= 23) {
        hour = hour - 12;
        ampm = 'PM';
      }
    }

    return '${hour}:${minute} ${ampm}';
  }

  return null;
}

Multi-turn conversation handler for missing information:

class ConversationHandler {
  ConversationContext _context = ConversationContext();

  Future<void> handleCommand(String input) async {
    final command = VoiceCommandParser.parse(input);

    if (command.confidence < 0.7) {
      await _voice.speak("I'm not sure. Did you want to add a calendar event or create a memo?");
      return;
    }

    // Handle missing information
    if (command.intent == 'calendar_add') {
      if (command.title == null) {
        _context.state = ConversationState.waitingForTitle;
        await _voice.speak("What would you like me to remind you about?");
        return;
      }

      if (command.time == null) {
        _context.state = ConversationState.waitingForTime;
        await _voice.speak("What time should I set the reminder for?");
        return;
      }

      await _createCalendarEvent(command);
    }
  }
}

Performance after this approach:

Recognition accuracy: 90% for supported patterns
Response time: <300ms end-to-end
Memory usage: 45MB while active
Battery impact: <2% over full day of testing

Real example that works:
User: "Remind me to call mom tomorrow at three"
↓
STT: "remind me to call mom tomorrow at three"

↓
Pattern match: RegExp(r'remind me to (.?) at (.)')
↓
Extract: title="call mom tomorrow", time="three"

↓
Time parsing: "three" → "3:00 PM" (afternoon guess)
↓
Date parsing: "tomorrow" → DateTime.now().add(Duration(days: 1))
↓
Create task in SQLite
↓
TTS: "Perfect! I've added 'call mom' for 3 PM tomorrow"

Attempt 3: The Complete Alexa-Level System

Realized I was thinking about this wrong. Instead of trying to match Alexa, I built something simpler that works reliably.

My actual architecture:

// 1. Local STT with better settings
await _speech.listen(
  onDevice: true,
  listenFor: Duration(seconds: 3), // Shorter timeout
  cancelOnError: true,
  partialResults: false // Wait for complete result
);

// 2. Pattern-based parsing with multiple variations
static VoiceCommand parse(String input) {
  input = input.toLowerCase().trim();

  // Check each pattern category
  for (final entry in patterns.entries) {
    final intent = entry.key;
    final patternList = entry.value;

    for (final pattern in patternList) {
      final match = pattern.firstMatch(input);
      if (match != null) {
        return _extractCommand(intent, input, match);
      }
    }
  }

  return VoiceCommand(intent: 'unknown');
}

// 3. Smart time parsing
static String? _parseTime(String timeStr) {
  final conversions = {
    'morning': '9:00 AM',
    'afternoon': '2:00 PM',
    'evening': '6:00 PM',
    'noon': '12:00 PM',
  };

  // Handle natural language first
  for (final entry in conversions.entries) {
    if (timeStr.contains(entry.key)) {
      return entry.value;
    }
  }

  // Then handle actual times like "3pm" or "3:30"
  final timeMatch = RegExp(r'(\d{1,2})(?::(\d{2}))?\s*(am|pm)?')
      .firstMatch(timeStr);
  // ... parsing logic
}

Real example of what works:
User says: "Remind me to call mom at three"
↓
Local STT: "remind me to call mom at three"
↓
Pattern match: RegExp(r'remind me to (.?) at (.)')
↓
Extract: title="call mom", time="three"
↓
Parse time: "three" → "3:00 PM" (smart guess for afternoon)
↓
Create task in SQLite
↓
Response: "Added 'call mom' for 3:00 PM today"

Performance after optimization:

Recognition time: 200-400ms
Memory usage: 40MB while active
Accuracy: 85% for supported commands
Battery impact: <2% over full day

The Privacy Architecture I Actually Built

Problem: How do you prove to users that nothing leaves their phone?
My solution - complete transparency:

1. Visual indicators everywhere:

// Kai bubble pulses when listening
AnimatedContainer(
  duration: Duration(milliseconds: 300),
  decoration: BoxDecoration(
    color: _isListening 
      ? Color(0xFF9C7BD9).withOpacity(0.8)  // Active purple
      : Color(0xFF9C7BD9).withOpacity(0.2), // Calm purple
    shape: BoxShape.circle,
  ),
)

2. Data export built in from day 1:

class DataExportService {
  Future<String> exportAllUserData() async {
    final tasks = await CalendarService().getAllTasks();
    final memos = await MemoService().getAllMemos();

    return jsonEncode({
      'export_date': DateTime.now().toIso8601String(),
      'tasks': tasks.map((t) => t.toMap()).toList(),
      'memos': memos.map((m) => m.toMap()).toList(),
    });
  }
}

3. One-tap delete everything:

Future<void> deleteAllUserData() async {
  await CalendarService().clearAllTasks();
  await MemoService().clearAllMemos();
  await SharedPreferences.getInstance().then((prefs) => prefs.clear());
  // Show confirmation: "All data deleted"
}

What surprised me: In testing, I/user cared more about seeing the "Export my data" and "Delete everything" buttons than perfect voice accuracy. Just knowing I had control felt satisfying.

Database Design That Actually Works Offline

Used SQLite with sync-ready fields from the start:

class Task {
  final String id;
  final String title;
  final DateTime? date;
  final String? time;
  final bool isCompleted;

  // Sync-ready fields for future
  final DateTime lastModified;
  final String sourceDevice;
  final String status; // 'active' | 'deleted'

  Task({
    required this.id,
    required this.title,
    this.date,
    this.time,
    this.isCompleted = false,
    required this.lastModified,
    this.sourceDevice = 'kai-lite-android',
    this.status = 'active',
  });
}

Why this works:

Everything works offline immediately
Sync fields ready for when I build cross-device features
Soft deletes mean data recovery is possible
Device tracking for multi-device scenarios

Performance Debugging (The Fun Stuff)

Issue 1: Memory leaks during voice processing

// Problem: Not disposing speech service
@override
void dispose() {
  _speech.stop();  // Added this
  _speech.cancel(); // And this
  super.dispose();
}

Issue 2: Battery drain from overlay

// Problem: Overlay always active
// Solution: Smart hiding
void _hideOverlayDuringCalls() {
  if (_phoneStateService.isInCall()) {
    _overlay.hide();
  }
}

Issue 3: SQLite performance with 1000+ tasks

// Added indexing for date queries
await db.execute('''
  CREATE INDEX IF NOT EXISTS idx_task_date_status 
  ON tasks(date, status)
''');

What I Learned (Technical & Otherwise)

Technical insights:

SQLite performs way better than expected on mobile
Local speech processing is viable if you optimize for specific use cases
Pattern matching beats AI models for simple command parsing
Flutter overlays are battery killers if not managed properly

UX insights:

Privacy needs to feel empowering, not defensive
Visual feedback builds more trust than explanations
Reliable simple commands can feel smoother overall than unreliable complex ones

Architecture insights:
Build offline-first from day 1, add sync later
Start with the simplest solution that could work
Real user testing catches issues you never thought of

The Current State

What actually ships:

15+ voice command patterns that work reliably
Complete offline functionality (no internet required)
Export/delete controls for full data ownership
<300ms voice response time

Why I Switched from LLMs to Tiny, Instant Voice NLU for Kai Lite — Week 2

Emily Lin — Mon, 18 Aug 2025 04:29:33 +0000

This is part of my journey building the Kai ecosystem—a fully local, offline-first, emotionally-intelligent AI assistant.
This week, I’m sharing what actually happened as I tried (and failed, and retried) to build voice command understanding for Kai Lite, my mobile-first companion app.

🤖 My Two AI Collaborators

ChatGPT: Idea generator and architecture partner. I used it for feature planning, prompt design, and exploring approaches.

Claude: My “implementation sidekick.” Every time I got stuck on code, Claude helped debug, re-architect, and refactor.

Key moment:

Claude told me, “LLMs are nice, but too slow for instant mobile use. You’ll wait 2–3 seconds per command.”

That advice changed my approach—speed (and flow) beat size.

🧑‍💻 Attempt #1: Pattern Rules (Fast, but… Messy)

I started with classic rule-based parsing:

Regexes for matching intent (add event, check calendar, etc.)
Lots of if/else spaghetti in my voice_command_parser.dart

The result?

It only worked for exact commands

I kept adding more and more patterns, forgetting what I’d written before (lol)

Session log snippet with Claude:

"remind me to go fishing tomorrow at 3pm"
→ Matched: go to (.*)  ← WRONG!
→ Intent: navigation 
→ Result: Error/confusion

🚀 Attempt #2: Tiny, On-Device NLU (What Actually Works!)

With Claude’s push, I rebuilt the whole flow:
Architecture Overview

[User Speaks]
     ↓
[Whisper-tiny → Text] 
     ↓
[Intent Classifier: calendar_add, calendar_view, etc.] 
     ↓
[Entity Extractor: date, time, title] 
     ↓
[SmartVoiceParser → Structured Command] 
     ↓
[Local Calendar API → Event Created]

This architecture ensures:

✅ Speed: No network latency, instant feedback.
✅ Privacy: No audio or text leaves the device.
✅ Reliability: Not dependent on internet or third-party APIs.
✅ Simplicity: Small models focused on specific tasks.

How It Works (Step-by-Step)

Voice input via Whisper-tiny:

final text = await WhisperTinyEN.transcribe(audio);

Intent classification:

final intent = await IntentClassifier.classify(text);

Example output:

{ "intent": "calendar_add", "confidence": 0.87 }

Entity extraction:

final entities = await EntityExtractor.extract(text);

Example output:

{ "title": "go fishing", "date": "tomorrow", "time": "3:00 PM" }

Smart voice command assembly:

final command = SmartVoiceParser.parse(text);

Returns one object, e.g.:

{
  "intent": "calendar_add",
  "slots": {
    "title": "go fishing",
    "date": "tomorrow",
    "time": "3:00 PM"
  }
}

Calendar event created, instantly and offline.

Before vs After: Real Example

Pattern system failure:

"remind me to go fishing tomorrow at 3pm"
→ navigation intent (wrong)
→ confusion or error

Smart NLU success:

"remind me to go fishing tomorrow at 3pm"
→ calendar_add (confidence: 0.8+)
→ Title: "go fishing"
→ Time: "3:00 PM"
→ Date: "tomorrow"
→ Event created 🎉

Processing time: ~250–300 ms (on-device, fully offline)

🛠️ Technical Highlights

Whisper-tiny for fast, offline voice-to-text (39 MB)
BERT-tiny + intent head (~21 MB) for intent classification
Dateparser-light (~1 MB) for fuzzy dates (“next Friday”, “this weekend”)
All runs in <60 MB and feels instant on a modern phone
Fully local: no cloud, zero data leaves device

🪲 Bugs & Iterations

Early “smart” versions failed almost as much as my old rules
Six rounds of real-world testing and log reviews to get to “it just works”
“What’s my calendar like?” sometimes still triggers as an event… and honestly, I kind of love the bug now

🔑 Lessons:

Don’t over-engineer: Tiny, purpose-built NLU is better than a “mini-LLM” for command/slot tasks
Speed is UX: Even 2 seconds of lag kills the magic
Privacy: Everything is processed right on-device—no API, no server, no cloud
ChatGPT and Claude are amazing for rapid iteration and brainstorming—even for solo devs

💬 Wrap-up

Building Kai Lite this way taught me that “small” can be smarter, and gentle, local AI is possible for real daily use.

I’m Building a Local, Multi-Layer AI Assistant — Starting with Kai Lite (Week 1)

Emily Lin — Fri, 15 Aug 2025 19:57:23 +0000

I’ve been using AI daily for creative and strategic work for years. But after a recent update changed the behavior of my primary assistant — despite the same prompts — I realized something: personal nuance doesn’t survive at scale.

Large AI systems must follow global rules, minimize risk, and standardize outputs. That’s fair. But it also means they can’t preserve the subtle, evolving rhythm of a one-on-one collaboration.

So I’m stepping back.
Not to reject cloud AI.
But to design a personal AI ecosystem — local-first, private by default, and built around my actual workflow.

I’m not launching a product.
I’m not chasing autonomy.
I’m just building a system that works for me — starting simple.

This week, I began Week 1 of building Kai Lite — the mobile layer of a three-part architecture I’ve been planning.

The Vision: A Three-Layer Personal AI System

My goal is continuity, not complexity.

I want a single coherent experience across devices — each layer handling only what it needs to.

| LAYER               | PURPOSE                                | LLM                          | STATUS           |
|---------------------|----------------------------------------|------------------------------|------------------|
| Kai Lite (mobile)   | Capture, voice memos, quick tasks      | None                         | ✅ Starting now  |
| Kai Laptop          | Planning, memory, light automation     | Llama3 7B / Mistral 7B       | 🛠️ Design phase |
| Kai Desktop         | Deep work, reflection, business automation | Qwen3-32B, GPT-OSS 20B       | 🛠️ Design phase |

This isn’t about running giant models on every device.
It’s about scaling intelligence where it belongs.

Kai Lite: The Mobile Capture Layer (Week 1)

Right now, I’m focused on Kai Lite — a Flutter app for Android/iOS that acts as a lightweight entry point to my future AI ecosystem.

Why start here?

Most ideas begin on mobile
Voice, quick notes, and reminders are 80% of daily capture
A simple interface helps clarify what matters before adding complexity

Tech Stack:

Flutter (Dart) → Cross-platform, fast UI
SQLite → Local storage for memos, tasks, calendar
No LLM on device → Keeps it fast, private, and focused
Voice-to-text → Using platform APIs (no local model)
Floating overlay → Quick capture without opening the app

**Folder Structure (Simplified)
**kai_lite_app/
├── lib/
│ ├── screens/ # Home, calendar, memos
│ ├── overlay/ # Floating bubble & voice reflex
│ └── services/ # Voice, memo, calendar, remote API
├── models/ # Task, Memo data classes
├── assets/ # Persona, icon
└── pubspec.yaml

Key Files I’m Setting Up This Week:

overlay_bubble.dart → Messenger-style floating button
overlay_voice_reflex.dart → Hands-free voice capture
voice_service.dart → STT/TTS (no LLM)
remote_kai_service.dart → Future HTTP connection to laptop/desktop agent Right now, it’s just structure. No logic. No sync. Just setup.

But the vision is clear:
Capture inputs on mobile → process them locally or on a trusted machine → get back meaningful responses, not just summaries.

What’s Behind the Scenes (Design Phase)

While Kai Lite is the starting point, it’s part of a larger local AI architecture I’m designing.

On the Laptop:

Python-based agent using LangGraph for stateful workflows
ChromaDB for semantic memory (local vector DB)
Lightweight LLMs (Llama3 7B) for planning and reflection
Simple Kanban UI (simple_kanban.py) for task + memory management

On the Desktop (Future):

Qwen3-32B-Q5_K_M → Primary model for deep writing, planning, self-review
GPT-OSS 20B → Fallback/critic model
DeepSeek-Coder 33B → Coding (loaded only when needed)
Ollama → Local model management + GPU offload
Self-evaluation flows → One model critiques another
Biometric context → Heart rate data (Polar Verity) used to adjust tone and pacing (locally only)
All data stays on-device.
No cloud logging.
No third-party APIs for core functions.

Why Local? Why This Design?

| REASON                | MY CHOICE                                                                 |
|-----------------------|---------------------------------------------------------------------------|
| Privacy               | Sensitive data (voice, memos, HR) never leaves my devices                 |
| Stability             | No sudden changes from upstream model updates                             |
| Custom logic          | I can build recursive review, memory cleanup, publishing automation       |
| Regulatory reality    | Cloud AI must follow global rules — local AI can follow my rules          |

I’m not trying to replace GPT-4.
I’m trying to build something the cloud can’t:
An AI that knows my rhythms, respects my energy, and evolves with me — without asking permission.

Tools & Workflow (Planned)

Flutter → Mobile app
Python + LangGraph → Agent logic
ChromaDB → Semantic search over personal logs
Ollama → Run local LLMs with GPU offload (RTX
VS Code + Continue → Local coding support
Polar Verity → Heart rate data for context (optional future layer)

This Is Just the Beginning
I’m not live.
I’m not demoing.
I’m in Week 1 of building Kai Lite — setting up the foundation.

No magic.
No autonomy.
Just files, folders, and a clear direction.

If you're designing a personal AI system — not for scale, but for depth, control, and continuity — I’d love to hear your approach.

Because the future of AI shouldn’t be only in the cloud.
It should also be on your machine, in your hands, built for you.

Wait for more updates.
This is just the start.