<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ondrej Machala</title>
    <description>The latest articles on Forem by Ondrej Machala (@omachala).</description>
    <link>https://forem.com/omachala</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3724472%2F05d15299-914a-440a-ac2c-efa12c49e5da.jpeg</url>
      <title>Forem: Ondrej Machala</title>
      <link>https://forem.com/omachala</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/omachala"/>
    <language>en</language>
    <item>
      <title>What's New in Diction 5.0</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Tue, 14 Apr 2026 21:18:46 +0000</pubDate>
      <link>https://forem.com/omachala/whats-new-in-diction-50-1k0l</link>
      <guid>https://forem.com/omachala/whats-new-in-diction-50-1k0l</guid>
      <description>&lt;p&gt;Diction 5.0 is out. Three things changed in a meaningful way: the cloud is rebuilt from scratch, AI Companion can now edit text by voice, and the app is fully localized in 13 languages. Everything else is polish on top of that foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Diction One is a different thing now
&lt;/h2&gt;

&lt;p&gt;I spent several weeks working with physical hardware for this. Dedicated speech models for each language family, better audio processing, faster response across the board. If you tried cloud mode in an earlier version and found it slow or inaccurate, I'd ask you to try it again.&lt;/p&gt;

&lt;p&gt;The mic is always warm now. Tap and start talking immediately. No noticeable gap between the button and your voice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your voice can now edit
&lt;/h2&gt;

&lt;p&gt;AI Companion in v4 could clean up what you said. In v5, it can edit what's already written.&lt;/p&gt;

&lt;p&gt;You're writing an email. You wrote "let's meet Tuesday" but the meeting moved. Place your cursor anywhere in that sentence and say "change Tuesday to Thursday." No selecting, no deleting, no retyping.&lt;/p&gt;

&lt;p&gt;Or select a paragraph that's too stiff and say "make this more casual." Diction rewrites the selection. The action bar turns indigo when text is selected so you always know what mode you're in.&lt;/p&gt;

&lt;p&gt;Long-press the action bar to rewrite text around your cursor without selecting anything first. Say what's wrong, Diction fixes it.&lt;/p&gt;

&lt;h2&gt;
  
  
  13 languages, properly localized
&lt;/h2&gt;

&lt;p&gt;Every screen now adapts to your system language: Settings, History, Insights, everything. Switch the UI language live from the picker without restarting the keyboard.&lt;/p&gt;

&lt;p&gt;Before this release, everyone got English regardless of what their iPhone was set to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other changes worth knowing
&lt;/h2&gt;

&lt;p&gt;Insights is redesigned. Your typing speed multiplier is the main number, with daily average, words per minute, days used, and time saved all visible at once.&lt;/p&gt;

&lt;p&gt;New mic release options. "After dictation" drops the mic the moment the transcription finishes. Ten-second and 30-second options for music and podcast listeners.&lt;/p&gt;

&lt;p&gt;AirPods work correctly now. Music stays in stereo while Diction uses the built-in mic. Nothing ducks.&lt;/p&gt;

&lt;p&gt;AI Companion is smarter about keeping your voice. It preserves natural speech patterns, writes numbers as digits, and doesn't drop sentences on longer recordings.&lt;/p&gt;

&lt;p&gt;For self-hosters: support for bringing your own language model for AI Companion, a one-command setup covering 25 European languages, and smart routing that picks the right speech model per language with a health-checked fallback. Open source at github.com/omachala/diction.&lt;/p&gt;

&lt;p&gt;Diction 5.0 is on the App Store now: &lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;https://apps.apple.com/app/id6759807364&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ios</category>
      <category>productivity</category>
      <category>opensource</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>How Diction handles privacy</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Thu, 02 Apr 2026 09:00:00 +0000</pubDate>
      <link>https://forem.com/omachala/how-diction-handles-privacy-52mf</link>
      <guid>https://forem.com/omachala/how-diction-handles-privacy-52mf</guid>
      <description>&lt;p&gt;Voice keyboards sit in an uncomfortable position. Every app you use, every message you send, every search you type — the keyboard is there. It sees all of it.&lt;/p&gt;

&lt;p&gt;Most people install a keyboard and never think about this. I did, because I was building one.&lt;/p&gt;




&lt;p&gt;Earlier this year, someone reverse-engineered a popular voice keyboard and posted their findings. The app was collecting full browser URLs, names of focused apps, on-screen text scraped via the Accessibility API, clipboard contents including data copied from password managers, and sending it all back to a server. There was a function in the binary called &lt;code&gt;sendTrackResultToServer&lt;/code&gt;. None of this was in the privacy policy.&lt;/p&gt;

&lt;p&gt;This is not a hypothetical. It happened. And the only reason anyone found out is because the app was installed on a machine where someone was curious enough to look.&lt;/p&gt;

&lt;p&gt;That is the problem with closed-source software and privileged access: you cannot verify the claims. A privacy policy is a document. The code is what runs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Full Access and what it actually enables
&lt;/h2&gt;

&lt;p&gt;When iOS asks if you want to allow Full Access for a keyboard, the permission is broader than most people realise. It enables network access (how keyboards send audio for transcription or sync dictionaries). But in the wrong hands it also means the keyboard code runs in a context where it could read clipboard data, monitor app usage patterns, or transmit information alongside its legitimate function.&lt;/p&gt;

&lt;p&gt;Diction has no QWERTY keys. There is nothing to type into it, so nothing to log in that sense. But I wanted to go further than just "we don't do the bad thing." I wanted to build it so you can verify we don't.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I built Diction with this in mind
&lt;/h2&gt;

&lt;p&gt;There are three ways Diction can process your audio, and I picked each one with this threat model in mind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On-device&lt;/strong&gt; is the cleanest answer. Your audio never leaves your iPhone. A local speech model handles transcription, the result comes back, and that is it. No server, no transmission, no policy to read. If you want absolute certainty, this is the mode for you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-hosted&lt;/strong&gt; is for people who want cloud-quality transcription but on infrastructure they control. You point the app at your own server. Your audio goes there and nowhere else. I have no access to what you say or what gets transcribed. The server software is open source. You can read exactly what it does before you run it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Diction One&lt;/strong&gt; is the hosted cloud option. Here I had to think carefully. Audio is processed in memory and discarded immediately after transcription. Nothing is written to disk. No transcriptions are stored or logged. And every transcription is encrypted with AES-256-GCM using a fresh X25519 key per request. The same standards WireGuard uses. I am not asking you to trust the policy. The implementation is in the open-source server code.&lt;/p&gt;




&lt;h2&gt;
  
  
  The app itself
&lt;/h2&gt;

&lt;p&gt;The Diction app contains no analytics and no tracking code. No device identifiers, no usage events, no behavioural monitoring. The App Store privacy label reads "Data Not Collected." I can say that confidently because I wrote every line and there is nothing there.&lt;/p&gt;




&lt;h2&gt;
  
  
  What you can actually verify
&lt;/h2&gt;

&lt;p&gt;The server code is public at &lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;github.com/omachala/diction&lt;/a&gt;. You can read the transcription handler and confirm that audio is not written anywhere. You can read the encryption implementation. If you run on-device mode, you can point a network inspector at the app and confirm no requests leave it.&lt;/p&gt;

&lt;p&gt;I built it this way because I wanted to use this keyboard myself. And I was not willing to just trust a policy page written by someone else.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;Download on the App Store&lt;/a&gt; · &lt;a href="https://diction.one/privacy-first" rel="noopener noreferrer"&gt;diction.one/privacy-first&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ios</category>
      <category>privacy</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
    <item>
      <title>What's New in Diction 4.0</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Wed, 01 Apr 2026 15:04:42 +0000</pubDate>
      <link>https://forem.com/omachala/whats-new-in-diction-40-2609</link>
      <guid>https://forem.com/omachala/whats-new-in-diction-40-2609</guid>
      <description>&lt;p&gt;Diction 4.0 is the biggest update since launch. The theme is straightforward: do more with your voice, with fewer rough edges. This release took hundreds of commits and more testing rounds than I want to count. Here is what landed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Speak to Edit
&lt;/h2&gt;

&lt;p&gt;This is the headline feature. Select any text in any app, tap the mic, and say what you want changed.&lt;/p&gt;

&lt;p&gt;You are editing an email. You select "Wednesday works for me" and say "Thursday actually." Diction replaces the selection.&lt;/p&gt;

&lt;p&gt;It also handles instructions. Select a paragraph and say "translate to Czech." Or "make this shorter." Or "more formal." Diction figures out whether you are giving a literal replacement or an editing instruction and acts accordingly.&lt;/p&gt;

&lt;p&gt;Before this, voice keyboards were append-only. You could dictate new text, but editing meant switching to the regular keyboard. Now you stay in voice the whole time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Custom Words Improve Transcription Directly
&lt;/h2&gt;

&lt;p&gt;In 3.0, custom words (My Words) only helped during AI Enhancement cleanup. Now they feed directly into the speech model as vocabulary hints.&lt;/p&gt;

&lt;p&gt;Your coworker's name is Kaelith. Your product is called Nexaro. You added both to My Words. Now the raw transcription gets them right on the first pass, even with AI Enhancement turned off.&lt;/p&gt;

&lt;p&gt;This matters most for anyone dictating technical terms, brand names, or anything the base model has never seen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Long Recordings That Actually Finish
&lt;/h2&gt;

&lt;p&gt;Previous versions could cut off or lose the end of longer dictations. 4.0 rewrites the recording pipeline to handle long sessions without dropping audio.&lt;/p&gt;

&lt;p&gt;If you are dictating meeting notes, a long email, or journal entries, the transcript comes back complete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Profile
&lt;/h2&gt;

&lt;p&gt;Tell Diction who you are and how you write. "I'm a software engineer. I write in short, direct sentences. I use American English."&lt;/p&gt;

&lt;p&gt;AI Enhancement uses your profile to match your style. Instead of generic cleanup, it produces text that sounds like you actually wrote it. The profile persists across all your dictations, so you set it once.&lt;/p&gt;

&lt;h2&gt;
  
  
  Guided Onboarding
&lt;/h2&gt;

&lt;p&gt;First launch used to throw permission dialogs at you and hope you figured it out. Now there is a step-by-step walkthrough: keyboard installation, permissions, first dictation. You know exactly where you are and what to do next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Better On-Device Setup
&lt;/h2&gt;

&lt;p&gt;Downloading speech models should not be confusing. The download flow is smoother now, preparation is faster, and the model is ready to use as soon as it finishes. No extra steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  No More Phantom Orange Dot
&lt;/h2&gt;

&lt;p&gt;Opening the Diction app used to activate the microphone, which lit up the iOS orange dot even though you were not dictating. Fixed. The mic only activates when you actually start a dictation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Under the Hood
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AI Enhancement accuracy improved across apps&lt;/li&gt;
&lt;li&gt;UI polish across the keyboard, history, tones, and settings&lt;/li&gt;
&lt;li&gt;Stability improvements throughout. 4.0 is a significantly more stable release than 3.0.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Diction is a voice keyboard for iPhone. Tap the mic, speak, text appears wherever your cursor is. On-device, cloud, or self-hosted.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;App Store&lt;/a&gt; / &lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; / &lt;a href="https://diction.one" rel="noopener noreferrer"&gt;Website&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ios</category>
      <category>opensource</category>
      <category>productivity</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>I Use NanoClaw in Telegram All Day. I Stopped Typing to It.</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Sat, 28 Mar 2026 09:00:00 +0000</pubDate>
      <link>https://forem.com/omachala/i-run-an-ai-agent-in-telegram-all-day-i-stopped-typing-to-it-3g7o</link>
      <guid>https://forem.com/omachala/i-run-an-ai-agent-in-telegram-all-day-i-stopped-typing-to-it-3g7o</guid>
      <description>&lt;p&gt;I have NanoClaw connected to my Telegram. Throughout the day I send it things. Translate this. Summarise that article. What time is it in Tokyo. Draft a reply to this message. It responds in the same thread, without me leaving the app.&lt;/p&gt;

&lt;p&gt;It runs on my own machine, inside a container. The agent only has access to what you explicitly give it. Setup took about fifteen minutes: clone the repo, run Claude Code, type &lt;code&gt;/setup&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What I didn't anticipate: I was still typing everything. Long questions. Multi-sentence requests. Context I had to spell out carefully. The assistant was right there in Telegram, but slow input was still slowing me down.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Diction comes in
&lt;/h2&gt;

&lt;p&gt;I built Diction to fix exactly this. It's an iOS keyboard extension. In Telegram, you switch to it, tap the mic, speak, and the text appears in the compose field. Send it like any message.&lt;/p&gt;

&lt;p&gt;I dictate to NanoClaw now. "Can you draft a short reply to this email, keep it friendly but firm." Things that would take a minute to type take ten seconds to say. NanoClaw gets the same message either way.&lt;/p&gt;

&lt;p&gt;Diction has an on-device mode that runs locally on your iPhone. Nothing leaves the device. For a setup where the whole point is keeping your data on your own hardware, that felt like the right match.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup, end to end
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;NanoClaw:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Clone &lt;a href="https://github.com/qwibitai/nanoclaw" rel="noopener noreferrer"&gt;github.com/qwibitai/nanoclaw&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Run Claude Code in the repo directory&lt;/li&gt;
&lt;li&gt;Type &lt;code&gt;/setup&lt;/code&gt; — Claude Code handles everything&lt;/li&gt;
&lt;li&gt;Connect your Telegram bot token when prompted&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Diction:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install from the &lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;App Store&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Settings → General → Keyboard → Add New Keyboard → Diction&lt;/li&gt;
&lt;li&gt;Switch to it in Telegram, tap the mic&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the whole stack. A personal assistant on your own hardware, voice input on your own device.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://github.com/qwibitai/nanoclaw" rel="noopener noreferrer"&gt;NanoClaw on GitHub&lt;/a&gt; | &lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;Diction on the App Store&lt;/a&gt;&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>selfhosted</category>
      <category>ios</category>
      <category>ai</category>
    </item>
    <item>
      <title>Self-Host Speech-to-Text and Use It as Your iPhone Keyboard in 3 Commands</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Thu, 26 Mar 2026 09:00:00 +0000</pubDate>
      <link>https://forem.com/omachala/self-host-whisper-and-use-it-as-your-iphone-keyboard-in-3-commands-dp6</link>
      <guid>https://forem.com/omachala/self-host-whisper-and-use-it-as-your-iphone-keyboard-in-3-commands-dp6</guid>
      <description>&lt;p&gt;If you're running a homelab, you've probably already got speech-to-text somewhere in your stack.&lt;/p&gt;

&lt;p&gt;Maybe you use it for Home Assistant voice commands. Or local LLM integrations. Or just transcribing meeting recordings.&lt;/p&gt;

&lt;p&gt;Here's something you might not have considered: you can use that same transcription server as a keyboard on your iPhone.&lt;/p&gt;




&lt;h2&gt;
  
  
  The 3 Commands
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/omachala/diction
&lt;span class="nb"&gt;cd &lt;/span&gt;diction
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the server running. Now install &lt;a href="https://apps.apple.com/app/id6759807364" rel="noopener noreferrer"&gt;Diction&lt;/a&gt; on your iPhone, point it at your server URL, and you have a voice keyboard backed by your own speech-to-text instance.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Actually Running
&lt;/h2&gt;

&lt;p&gt;The Docker Compose setup spins up two services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;

  &lt;span class="na"&gt;whisper-small&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fedirz/faster-whisper-server:latest-cpu&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Systran/faster-whisper-small&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__INFERENCE_DEVICE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cpu&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;whisper-small&lt;/strong&gt;: the transcription engine — runs open-source Whisper via a REST API. CPU works fine for real-time dictation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;gateway&lt;/strong&gt;: a small open-source Go service that handles communication between the iOS app and the transcription backend. It accepts WebSocket connections from the phone, buffers audio frames, and forwards them to Whisper. This is what makes dictation feel instant instead of "record, upload, wait."&lt;/p&gt;

&lt;p&gt;The gateway exposes port 8080. That's the URL you put into the Diction app.&lt;/p&gt;




&lt;h2&gt;
  
  
  Making It Accessible From Your Phone
&lt;/h2&gt;

&lt;p&gt;Your phone needs to reach your server. A few options depending on your setup:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tailscale&lt;/strong&gt; (easiest): Install Tailscale on both your server and iPhone. You get a private IP accessible from anywhere. No port forwarding, no firewall rules.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://100.x.x.x:8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Reverse proxy&lt;/strong&gt; (for existing homelabbers): If you're already running Caddy, nginx, or Traefik, add a route to port 8080.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://diction.yourdomain.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Direct LAN&lt;/strong&gt; (simplest for home-only use): Just use your server's local IP. Works on home WiFi, not outside.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://192.168.1.100:8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Choosing a Model
&lt;/h2&gt;

&lt;p&gt;Swap the transcription model by changing &lt;code&gt;WHISPER__MODEL&lt;/code&gt;. The model downloads automatically on first use.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;th&gt;Speed (CPU)&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tiny&lt;/td&gt;
&lt;td&gt;~350MB&lt;/td&gt;
&lt;td&gt;~1-2s&lt;/td&gt;
&lt;td&gt;Lower accuracy, great for low-power hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;~800MB&lt;/td&gt;
&lt;td&gt;~3-4s&lt;/td&gt;
&lt;td&gt;Good default for everyday dictation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;~1.8GB&lt;/td&gt;
&lt;td&gt;~8-12s&lt;/td&gt;
&lt;td&gt;Better with accents and background noise&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large&lt;/td&gt;
&lt;td&gt;~3.5GB&lt;/td&gt;
&lt;td&gt;~20-30s&lt;/td&gt;
&lt;td&gt;Highest accuracy, benefits from GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For most home servers, the small model hits the sweet spot — fast enough to feel real-time, accurate enough for messages and notes.&lt;/p&gt;

&lt;p&gt;Swap it in your compose file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;whisper-small&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fedirz/faster-whisper-server:latest-cpu&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Systran/faster-whisper-medium&lt;/span&gt;
      &lt;span class="na"&gt;WHISPER__INFERENCE_DEVICE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cpu&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Running on Lower-Power Hardware
&lt;/h2&gt;

&lt;p&gt;The small model runs well on modern CPUs. For a NAS or Raspberry Pi, try &lt;code&gt;tiny&lt;/code&gt; — less RAM (~350MB), faster responses, some accuracy trade-off. For real-time keyboard use, aim for sub-3 second round-trip. Small on a modern CPU or tiny on lower-power hardware gets you there.&lt;/p&gt;




&lt;h2&gt;
  
  
  Already Running a Whisper Server?
&lt;/h2&gt;

&lt;p&gt;If you already have a speech-to-text container running, you don't need to spin up another one. Just run the gateway and point it at your existing server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;CUSTOM_BACKEND_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://your-server:8000&lt;/span&gt;
      &lt;span class="na"&gt;CUSTOM_BACKEND_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your-model-name&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More details on connecting to existing servers in &lt;a href="https://dev.to/omachala/you-already-have-a-speech-server-your-iphone-keyboard-should-use-it-7oh"&gt;this post&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;The server and gateway are fully open source: &lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;github.com/omachala/diction&lt;/a&gt;&lt;/p&gt;

</description>
      <category>selfhosted</category>
      <category>docker</category>
      <category>ios</category>
      <category>productivity</category>
    </item>
    <item>
      <title>You Already Have a Speech Server. Your iPhone Keyboard Should Use It.</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Wed, 18 Mar 2026 09:36:35 +0000</pubDate>
      <link>https://forem.com/omachala/you-already-have-a-speech-server-your-iphone-keyboard-should-use-it-7oh</link>
      <guid>https://forem.com/omachala/you-already-have-a-speech-server-your-iphone-keyboard-should-use-it-7oh</guid>
      <description>&lt;p&gt;Someone posted on our GitHub Discussions this week. They'd been running a speech-to-text container on their homelab for months. Found Diction, an open-source iOS voice keyboard. Pointed the app at their server. Got a server error. The settings screen even said "endpoint reachable."&lt;/p&gt;

&lt;p&gt;Here's what was going wrong, and how two lines of config fixes it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why direct connection fails
&lt;/h2&gt;

&lt;p&gt;Diction doesn't talk directly to speech servers. It connects through a lightweight gateway first.&lt;/p&gt;

&lt;p&gt;The reason is WebSockets. When you tap the mic, the app opens a WebSocket and streams raw PCM audio to the gateway in real time as you speak. When you're done, the gateway POSTs the full audio to your speech server, gets the transcript, and sends it back. The whole exchange happens in the time it takes to stop speaking.&lt;/p&gt;

&lt;p&gt;Without this, the alternative is: record the whole thing, send a file, wait. You'd feel every pause. The WebSocket is what makes it feel instant.&lt;/p&gt;

&lt;p&gt;The "endpoint reachable" check passes because the iOS app pings &lt;code&gt;/health&lt;/code&gt; or &lt;code&gt;/v1/models&lt;/code&gt;. Most speech servers expose these. But the actual transcription uses the WebSocket endpoint, which only the gateway handles. No gateway, no streaming.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;

&lt;p&gt;You don't need to run our speech containers. Just the gateway, pointed at yours.&lt;/p&gt;

&lt;p&gt;If your server is at &lt;code&gt;http://192.168.1.50:8000&lt;/code&gt;, this is your entire &lt;code&gt;docker-compose.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;CUSTOM_BACKEND_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://192.168.1.50:8000&lt;/span&gt;
      &lt;span class="na"&gt;CUSTOM_BACKEND_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your-model-name-here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open Diction, go to &lt;strong&gt;Self-Hosted&lt;/strong&gt;, paste &lt;code&gt;http://192.168.1.50:8080&lt;/code&gt;. Done.&lt;/p&gt;

&lt;p&gt;Your model stays where it is. The gateway handles the WebSocket layer, audio buffering, and forwarding. Audio still only goes to your server.&lt;/p&gt;

&lt;h2&gt;
  
  
  CUSTOM_BACKEND_MODEL
&lt;/h2&gt;

&lt;p&gt;One thing to get right: the model name.&lt;/p&gt;

&lt;p&gt;Most speech servers that follow the OpenAI-compatible API format expect a &lt;code&gt;model&lt;/code&gt; field in the transcription request to know which model to load. Without it, some return an error.&lt;/p&gt;

&lt;p&gt;Set &lt;code&gt;CUSTOM_BACKEND_MODEL&lt;/code&gt; to whatever name your server expects. Check your server's docs or the model you started it with. If your server only runs one model and ignores the field, you can omit it entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  WAV-only servers
&lt;/h2&gt;

&lt;p&gt;Some speech servers only accept WAV audio input. The gateway handles conversion automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;CUSTOM_BACKEND_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://192.168.1.50:5092&lt;/span&gt;
  &lt;span class="na"&gt;CUSTOM_BACKEND_NEEDS_WAV&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this set, the gateway converts audio to 16kHz mono WAV via ffmpeg before forwarding. Your server gets the format it expects.&lt;/p&gt;

&lt;h2&gt;
  
  
  API key protection
&lt;/h2&gt;

&lt;p&gt;If your server is behind an API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;CUSTOM_BACKEND_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://my-server:8000&lt;/span&gt;
  &lt;span class="na"&gt;CUSTOM_BACKEND_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-model&lt;/span&gt;
  &lt;span class="na"&gt;CUSTOM_BACKEND_AUTH&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sk-your-key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gateway injects the Authorization header on every request to your backend.&lt;/p&gt;

&lt;h2&gt;
  
  
  Latency on a local network
&lt;/h2&gt;

&lt;p&gt;I tested this end-to-end: generated a speech WAV, sent it through the gateway to a real speech container, got the transcript back correctly. On a local network with a CPU-only container, the round trip was under 5 seconds. With a dedicated GPU, it's near instant.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the gateway actually does
&lt;/h2&gt;

&lt;p&gt;The gateway is open source at &lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;github.com/omachala/diction&lt;/a&gt;. It's a small Go service that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accepts WebSocket connections from the iOS app&lt;/li&gt;
&lt;li&gt;Buffers incoming PCM audio frames&lt;/li&gt;
&lt;li&gt;Wraps them in a WAV header and POSTs to your speech backend&lt;/li&gt;
&lt;li&gt;Returns the transcript over the WebSocket&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No cloud calls. No telemetry. The full source is in &lt;code&gt;/gateway/core/&lt;/code&gt;.&lt;/p&gt;




&lt;p&gt;If you're running a Whisper server and want to get started from scratch, the &lt;a href="https://dev.to/omachala/self-host-whisper-and-use-it-as-your-iphone-keyboard-in-3-commands-dp6"&gt;3-command setup guide&lt;/a&gt; covers the full stack.&lt;/p&gt;

</description>
      <category>selfhosted</category>
      <category>docker</category>
      <category>ios</category>
      <category>opensource</category>
    </item>
    <item>
      <title>You Wrote 14 Playwright Scripts Just to Screenshot Your Own App</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Tue, 17 Mar 2026 14:00:00 +0000</pubDate>
      <link>https://forem.com/omachala/you-wrote-14-playwright-scripts-just-to-screenshot-your-own-app-2ckf</link>
      <guid>https://forem.com/omachala/you-wrote-14-playwright-scripts-just-to-screenshot-your-own-app-2ckf</guid>
      <description>&lt;p&gt;It started simple. One Playwright script to capture the homepage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://myapp.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;screenshot&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;homepage.png&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the team needed the pricing page. So I added another script. Then the dashboard (which needs login first). Then the settings page (which needs a specific tab clicked). Then mobile versions.&lt;/p&gt;

&lt;p&gt;Two months later I had 14 Playwright scripts. Some shared a login helper. Some had hardcoded waits. One had a try-catch that silently swallowed errors because the cookie banner sometimes loaded and sometimes didn't.&lt;/p&gt;

&lt;p&gt;I was maintaining a bespoke test suite, except it wasn't testing anything. It was just taking pictures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Config, not code
&lt;/h2&gt;

&lt;p&gt;Here's what those 14 scripts look like as config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hiddenElements"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"myapp.com"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;".cookie-banner"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".chat-widget"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"screenshots"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"homepage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".hero"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pricing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com/pricing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".pricing-grid"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com/dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".dashboard"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"settings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com/settings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".settings-panel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"click"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[data-tab='notifications']"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All 14 screenshots. One command. No scripts to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cookie banner problem
&lt;/h2&gt;

&lt;p&gt;In my scripts, I had this pattern everywhere:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.cookie-banner .dismiss&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* maybe it didn't show */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;a href="https://github.com/omachala/heroshot" rel="noopener noreferrer"&gt;heroshot&lt;/a&gt;, you define hidden elements once per domain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hiddenElements"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"myapp.com"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;".cookie-banner"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"docs.myapp.com"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;".cookie-banner"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".announcement-bar"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every screenshot on that domain hides those elements automatically. No try-catch. No timeouts. No "maybe it showed, maybe it didn't."&lt;/p&gt;

&lt;h2&gt;
  
  
  Actions for complex state
&lt;/h2&gt;

&lt;p&gt;The settings page needed a tab clicked first. In Playwright, that's 5 lines of setup. In config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"click"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[data-tab='notifications']"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wait"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Email preferences"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are 14 action types: click, type, hover, select_option, press_key, drag, wait, navigate, evaluate, fill_form, handle_dialog, file_upload, resize, and hide. Covers pretty much every pre-screenshot setup I've needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use Playwright directly
&lt;/h2&gt;

&lt;p&gt;If you need complex conditional logic, dynamic data generation, or integration with a test framework, raw Playwright is the right tool.&lt;/p&gt;

&lt;p&gt;But if you're just taking pictures of known pages at known states, config is simpler, more readable, and doesn't break when someone renames a helper function.&lt;/p&gt;

</description>
      <category>playwright</category>
      <category>testing</category>
      <category>automation</category>
      <category>documentation</category>
    </item>
    <item>
      <title>I Stopped Paying $15/Month for Wispr Flow. Here's the Open-Source Replacement.</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Mon, 16 Mar 2026 08:58:03 +0000</pubDate>
      <link>https://forem.com/omachala/i-stopped-paying-15month-for-wispr-flow-heres-the-open-source-replacement-313i</link>
      <guid>https://forem.com/omachala/i-stopped-paying-15month-for-wispr-flow-heres-the-open-source-replacement-313i</guid>
      <description>&lt;p&gt;I paid for Wispr Flow for five months.&lt;/p&gt;

&lt;p&gt;A monthly subscription. Every month. For voice-to-text on my iPhone.&lt;/p&gt;

&lt;p&gt;It's a good product. The AI editing layer is genuinely impressive — it strips filler words, fixes grammar, adapts to how you write. That part works. If you want the best cloud-based dictation and don't mind paying, Wispr delivers.&lt;/p&gt;

&lt;p&gt;But every time I used it, the same thought: &lt;em&gt;my voice is going to their cloud. Not my cloud. Theirs.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I already run a home server. Docker Compose, Tailscale, the usual homelab stack. I had faster-whisper running for other things. The transcription engine was already there. I just didn't have a way to use it from my phone.&lt;/p&gt;

&lt;p&gt;So I built one.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the switch actually looked like
&lt;/h2&gt;

&lt;p&gt;The server side was easy. I already had the transcription container. I wrote a small Go gateway to handle WebSocket streaming from the phone, and wrapped both in a compose file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;transcription&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fedirz/faster-whisper-server:latest-cpu&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;models:/root/.cache/huggingface&lt;/span&gt;

  &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/omachala/diction-gateway:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;small&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;transcription&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;docker compose up -d&lt;/code&gt; and it's running.&lt;/p&gt;

&lt;p&gt;The hard part was the iOS keyboard. Keyboard extensions on iOS run in a sandbox with a 48MB memory ceiling, no direct mic access without Full Access, and a text proxy that behaves differently in every app. That took months, not hours.&lt;/p&gt;

&lt;p&gt;The result is &lt;a href="https://diction.one" rel="noopener noreferrer"&gt;Diction&lt;/a&gt; — a voice keyboard that connects to whatever transcription server you point it at.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's honestly worse
&lt;/h2&gt;

&lt;p&gt;Wispr's AI editing layer is better than raw transcription. It doesn't just transcribe — it rewrites. Filler words vanish, punctuation lands correctly, and it matches your tone. Diction transcribes what you say. It has optional AI cleanup now, but Wispr's has had years of refinement.&lt;/p&gt;

&lt;p&gt;Wispr also has a personal dictionary that learns your vocabulary over time. Diction has custom dictionaries too, but they're newer and simpler.&lt;/p&gt;

&lt;p&gt;If you don't want to think about infrastructure and just want the best cloud experience, Wispr is still a strong choice.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's better
&lt;/h2&gt;

&lt;p&gt;My audio stays on my network. I can verify that because the server code is open source — there's nothing to trust on faith.&lt;/p&gt;

&lt;p&gt;No word limits. Wispr's free tier caps you at 1,000 words/week on iOS. Self-hosted Diction has no caps, no subscription, no catch.&lt;/p&gt;

&lt;p&gt;Latency on a local network is excellent. The small Whisper model on a modern CPU returns transcriptions in 2-4 seconds. With a GPU, it's near instant.&lt;/p&gt;

&lt;p&gt;And when my internet goes down, on-device mode keeps working. Wispr is cloud-only — no connection, no transcription.&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest trade-off
&lt;/h2&gt;

&lt;p&gt;I traded polish for control. Wispr is more refined. Diction gives me ownership of the entire pipeline, from the mic to the model, and it's getting better with every release.&lt;/p&gt;

&lt;p&gt;If you're already running Docker at home and the idea of sending every word you speak to someone else's server bothers you, the self-hosted setup takes about 10 minutes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/omachala/diction" rel="noopener noreferrer"&gt;github.com/omachala/diction&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ios</category>
      <category>opensource</category>
      <category>productivity</category>
      <category>privacy</category>
    </item>
    <item>
      <title>Astro Docs Without a Single Manual Screenshot</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Sat, 14 Mar 2026 14:00:00 +0000</pubDate>
      <link>https://forem.com/omachala/astro-docs-without-a-single-manual-screenshot-3fi8</link>
      <guid>https://forem.com/omachala/astro-docs-without-a-single-manual-screenshot-3fi8</guid>
      <description>&lt;p&gt;I set up an Astro docs site last week. Content collections, MDX pages, Starlight theme. Beautiful.&lt;/p&gt;

&lt;p&gt;Then I needed screenshots. The dashboard page, the settings panel, the onboarding flow. Three screenshots, light and dark mode each, so six images total.&lt;/p&gt;

&lt;p&gt;I spent an hour taking them by hand. Opened the app, resized the browser, captured, cropped, saved, repeated. Six times.&lt;/p&gt;

&lt;p&gt;The next sprint, the onboarding flow changed. Screenshots were already stale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Config-driven screenshots
&lt;/h2&gt;

&lt;p&gt;Instead of capturing manually, define what you need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outputDirectory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/assets/screenshots"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"screenshots"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com/dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".dashboard-grid"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"settings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com/settings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".settings-panel"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"onboarding"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://myapp.com/onboarding"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"selector"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".onboarding-wizard"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three config entries, six files. Light and dark variants are included automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using them in Astro
&lt;/h2&gt;

&lt;p&gt;In your MDX file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { Image } from 'astro:assets';
import dashboard from '../../assets/screenshots/dashboard-light.png';

&amp;lt;Image src={dashboard} alt="Dashboard overview" /&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or if you want automatic dark mode switching, use a &lt;code&gt;&amp;lt;picture&amp;gt;&lt;/code&gt; tag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;picture&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;source&lt;/span&gt;
    &lt;span class="na"&gt;srcset=&lt;/span&gt;&lt;span class="s"&gt;"/screenshots/dashboard-dark.png"&lt;/span&gt;
    &lt;span class="na"&gt;media=&lt;/span&gt;&lt;span class="s"&gt;"(prefers-color-scheme: dark)"&lt;/span&gt;
  &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;img&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"/screenshots/dashboard-light.png"&lt;/span&gt; &lt;span class="na"&gt;alt=&lt;/span&gt;&lt;span class="s"&gt;"Dashboard"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/picture&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Heroshot has a shortcut for this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx heroshot snippet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It generates the &lt;code&gt;&amp;lt;picture&amp;gt;&lt;/code&gt; markup for every screenshot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Astro + Starlight
&lt;/h2&gt;

&lt;p&gt;If you're using Starlight (Astro's docs theme), screenshots go in &lt;code&gt;src/assets/&lt;/code&gt; so Astro optimizes them at build time. Set the output directory accordingly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outputDirectory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/assets/screenshots"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Starlight handles responsive images, lazy loading, and format conversion automatically. You just provide the source PNGs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping them fresh
&lt;/h2&gt;

&lt;p&gt;Add to your workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# After every deploy&lt;/span&gt;
npx heroshot

&lt;span class="c"&gt;# Check if anything changed&lt;/span&gt;
git diff &lt;span class="nt"&gt;--name-only&lt;/span&gt; src/assets/screenshots/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If screenshots changed, commit them. If nothing changed, move on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/omachala/heroshot" rel="noopener noreferrer"&gt;Heroshot&lt;/a&gt; is open source and works with any static site generator. Astro, Next.js, VitePress, whatever. The config is framework-agnostic.&lt;/p&gt;

</description>
      <category>astro</category>
      <category>documentation</category>
      <category>webdev</category>
      <category>automation</category>
    </item>
    <item>
      <title>Keep Your MkDocs Screenshots Up to Date (Material Theme)</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Thu, 29 Jan 2026 08:51:37 +0000</pubDate>
      <link>https://forem.com/omachala/keep-your-mkdocs-screenshots-up-to-date-material-theme-4h1i</link>
      <guid>https://forem.com/omachala/keep-your-mkdocs-screenshots-up-to-date-material-theme-4h1i</guid>
      <description>&lt;p&gt;You've got MkDocs with Material theme. Dark mode works. Code blocks adapt. But your screenshots? Still stuck in light mode.&lt;/p&gt;

&lt;p&gt;Here's how to fix that with &lt;a href="https://heroshot.sh" rel="noopener noreferrer"&gt;Heroshot&lt;/a&gt; - a CLI that captures screenshots and handles light/dark variants automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Install the CLI
&lt;/h2&gt;

&lt;p&gt;Pick your preferred method:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;curl (standalone binary):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://heroshot.sh/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Homebrew:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;omachala/heroshot/heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;npm:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Docker:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker pull heroshot/heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Install the Python package
&lt;/h2&gt;

&lt;p&gt;The CLI captures screenshots. The Python package provides the MkDocs macro:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Configure output directory
&lt;/h2&gt;

&lt;p&gt;MkDocs serves from &lt;code&gt;docs/&lt;/code&gt;. Create &lt;code&gt;.heroshot/config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outputDirectory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docs/heroshots"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-project/
├── docs/
│   ├── index.md
│   └── heroshots/    # screenshots go here
├── mkdocs.yml
└── .heroshot/
    └── config.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Capture screenshots
&lt;/h2&gt;

&lt;p&gt;Start your MkDocs dev server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mkdocs serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In another terminal, run heroshot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(Or &lt;code&gt;npx heroshot&lt;/code&gt; / &lt;code&gt;docker run --rm -v $(pwd):/work heroshot/heroshot&lt;/code&gt; depending on install method)&lt;/p&gt;

&lt;p&gt;A browser opens with a visual picker. Navigate to &lt;code&gt;localhost:8000&lt;/code&gt;, click on elements you want to capture, name them. Close when done.&lt;/p&gt;

&lt;p&gt;You'll get two files per screenshot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;dashboard-light.png&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dashboard-dark.png&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 5: Add the macro
&lt;/h2&gt;

&lt;p&gt;Update your &lt;code&gt;mkdocs.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;macros&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;modules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;heroshot&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use it in your markdown:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jinja"&gt;&lt;code&gt;&lt;span class="cp"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;heroshot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"dashboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Dashboard overview"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="cp"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The macro expands to Material's &lt;code&gt;#only-light&lt;/code&gt; / &lt;code&gt;#only-dark&lt;/code&gt; syntax. When readers toggle the theme, screenshots swap automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Keep them fresh
&lt;/h2&gt;

&lt;p&gt;When your UI changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All configured screenshots regenerate. No manual cropping, no file hunting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: Screenshots without theme variants
&lt;/h2&gt;

&lt;p&gt;For diagrams or architecture images that don't need dark mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jinja"&gt;&lt;code&gt;&lt;span class="cp"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;heroshot_single&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"architecture"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"System architecture"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="cp"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Renders a simple &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag.&lt;/p&gt;




&lt;p&gt;Full docs: &lt;a href="https://heroshot.sh/docs/integrations/mkdocs" rel="noopener noreferrer"&gt;heroshot.sh/docs/integrations/mkdocs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example repo: &lt;a href="https://github.com/omachala/heroshot/tree/main/integrations/examples/mkdocs" rel="noopener noreferrer"&gt;github.com/omachala/heroshot/tree/main/integrations/examples/mkdocs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>documentation</category>
      <category>mkdocs</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Screenshot Automation in Docusaurus: Getting Started</title>
      <dc:creator>Ondrej Machala</dc:creator>
      <pubDate>Tue, 27 Jan 2026 09:28:19 +0000</pubDate>
      <link>https://forem.com/omachala/dark-mode-screenshots-in-docusaurus-that-actually-switch-1djm</link>
      <guid>https://forem.com/omachala/dark-mode-screenshots-in-docusaurus-that-actually-switch-1djm</guid>
      <description>&lt;p&gt;Docusaurus has dark mode built in. Your docs adapt. Your code blocks adapt. Your screenshots? They just sit there, glaring white on a dark page.&lt;/p&gt;

&lt;p&gt;You could maintain two versions of every image and write a custom component to swap them.&lt;/p&gt;

&lt;p&gt;Or just automate it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://heroshot.sh" rel="noopener noreferrer"&gt;Heroshot&lt;/a&gt; is a CLI that captures screenshots and handles light/dark variants automatically. Here's how to set it up with Docusaurus.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Add the plugin
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// docusaurus.config.js&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;heroshot&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;heroshot/plugins/docusaurus&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;heroshot&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt;
  &lt;span class="c1"&gt;// ... rest of your config&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The plugin registers the manifest and sets up the output directory (&lt;code&gt;static/heroshots/&lt;/code&gt; by default).&lt;/p&gt;

&lt;h2&gt;
  
  
  Capture your first screenshot
&lt;/h2&gt;

&lt;p&gt;Run heroshot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx heroshot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A browser opens with a visual picker. Start your Docusaurus dev server in another terminal, navigate to it, and click on the element you want to capture. Name it, close the browser.&lt;/p&gt;

&lt;p&gt;You'll get two files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;dashboard-light.png&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dashboard-dark.png&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Use it in MDX
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { Heroshot } from 'heroshot/docusaurus';

&amp;lt;Heroshot name="dashboard" alt="Dashboard overview" /&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The component renders a &lt;code&gt;&amp;lt;picture&amp;gt;&lt;/code&gt; element that swaps between light and dark variants based on the user's theme. No flash, no reload.&lt;/p&gt;

&lt;h2&gt;
  
  
  Update screenshots
&lt;/h2&gt;

&lt;p&gt;When your UI changes, just run &lt;code&gt;npx heroshot&lt;/code&gt; again. All configured screenshots regenerate.&lt;/p&gt;

&lt;p&gt;That's the basic setup. Once you're running, check out &lt;a href="https://dev.to/omachala/heroshot-features-for-docusaurus-viewports-retina-cicd-3d01"&gt;the features article&lt;/a&gt; for viewports, retina support, and CI/CD automation.&lt;/p&gt;




&lt;p&gt;Docs: &lt;a href="https://heroshot.sh/docs/integrations/docusaurus.html" rel="noopener noreferrer"&gt;heroshot.sh/docs/integrations/docusaurus&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://heroshot.sh" rel="noopener noreferrer"&gt;heroshot.sh&lt;/a&gt; — free, open source.&lt;/p&gt;

</description>
      <category>react</category>
      <category>documentation</category>
      <category>docusaurus</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
